Research Article
BibTex RIS Cite

The Impact of Item Preknowledge on Scaling and Equating: Item Response Theory True and Observed Score Equating Methods

Year 2023, Volume: 14 Issue: 4, 455 - 471, 31.12.2023
https://doi.org/10.21031/epod.1199296

Abstract

Testing programs often reuse items due mainly to the difficulty and expense of creating new items. This poses potential problems to test item security if some or all test-takers have knowledge of the items prior to taking the test. Simulated data are used to assess the effect of preknowledge on item response theory true and observed score equating. Root mean square error and bias were used for the recovery of equated scores and linking coefficients for two scaling methods. Results of this study indicated that item preknowledge has a large effect on equated scores and linking coefficients. Furthermore, as the mean ability distribution of the group difference, the number of exposed items, and the number of examinees with item preknowledge increase, the bias and RMSE for equated scores and linking coefficients also increase. Additionally, IRT true score equating results in a larger bias and RMSE than does IRT observed score equating. These findings suggest that item preknowledge has the potential to inflate equated scores, putting the validity of the test scores at risk.

Project Number

2219 Yurt DIşı Doktora Sonrası Araştırma Burs Programı

References

  • American Educational Research Association, American Psychological Association and National Council on Measurement in Education (2014). Standards for educational and psychological testing.
  • Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
  • Barri, M. A. (2013). The impact anchor item exposure on mean/sigma linking And IRT true score equating under the neat design [Unpublished master’s thesis]. University of Kansas.
  • Belov, D. I. (2016). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83-97. https://doi.org/10.1177/0146621615603
  • Chen, D. F. (2021). Impact of item parameter drift on IRT linking methods [Unpublished doctoral thesis]. The University of North Carolina.
  • Cizek, G. (1999). Cheating on tests: how to do it, detect it, and prevent it. Mahwah, NJ: Lawrence Erlbaum.
  • Cizek, G. J., & Wollack, J. A. (Eds.). (2017). Handbook of quantitative methods for detecting cheating on tests. Routledge.
  • Cook, L. L., & Eignor, D. R. (1991). IRT equating methods. Educational measurement: Issues and practice, 10(3), 37-45. https://doi.org/10.1111/j.1745-3992.1991.tb00207.x
  • de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
  • Demir, M. K., & Arcagok, S. (2013). Sınıf öğretmeni adaylarının sınavlarda kopya çekilmesine ilişkin görüşlerinin değerlendirilmesi [Primary schoool teacher canditates’ opinions on cheating in exams]. Erzincan University Faculty of Eduction Journal, 15(1), 148-165. Retrieved from https://dergipark.org.tr/en/pub/erziefd/issue/6010/80121
  • Eckerly, C. A. (2017). Detecting preknowledge and item compromise. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 101-123). Routledge.
  • Fly, B. J. (1995). A study of ethical behaviour of students in graduate training programs in psychology [Unpublished doctoral thesis]. University of Denver.
  • Foster, D. (2013). Security issues in technology-based testing. In J. A. Wollack and J. J. Fremer , Eds., Handbook of test security (pp. 39–83). Routledge.
  • Gorney, K., & Wollack, J. A. (2022). Generating models for item preknowledge. Journal of Educational Measurement, 59(1), 22-42. https://doi.org/10.1111/jedm.12309.
  • Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121. https://doi.org/10.1207/s15324818ame1002_1
  • Harris, D. J. (1993, April). Practical issues in equating [Paper presentation]. American Educational Research Association, Atlanta, Georgia, USA.
  • Josephson Institute (2012). Josephson Institute’s 2012 report card on the ethics of American youth. Los Angeles, CA. Retrieved from http://charactercounts.org/programs/reportcard/2012/index.html.
  • Jurich, D. P. (2011). The impact of cheating on IRT equating under the non-equivalent anchor test design [Unpublished master’s thesis]. James Madison University.
  • Jurich, D. P., Goodman, J. T., & Becker, K. A. (2010). Assessment of various equating methods: Impact on the pass-fail status of cheaters and non-cheaters. In Poster presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.
  • Kane, M. T., & Mroch, A. A. (2020). Orthogonal Regression, the Cleary Criterion, and Lord's Paradox: Asking the Right Questions. ETS Research Report Series, 2020(1), 1-24. https://doi.org/10.1002/ets2.12298
  • Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. 3rd Edn. Springer
  • Liu, J., & Becker, K. (2022). The Impact of cheating on score comparability via pool‐based IRT pre‐equating. Journal of Educational Measurement, 59(2), 208-230. https://doi.org/10.1111/jedm.12321
  • Lee, S. Y. (2018). A mixture model approach to detect examinees with item preknowledge [Unpublished doctoral dissertation]. The University of Wisconsin-Madison.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Publishers.
  • Man, K., Harring, J. R., & Sinharay, S. (2019). Use of data mining methods to detect test fraud. Journal of Educational Measurement, 56(2), 251-279. https://doi.org/10.1111/jedm.12208
  • Qian, H., Staniewska, D., Reckase, M., & Woo, A. (2016). Using response time to detect item preknowledge in computer‐based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 38-47. https://doi.org/10.1111/emip.12102
  • Pan, Y., & Wollack, J. A. (2021). An unsupervised‐learning based approach to compromised items detection. Journal of Educational Measurement, 58(3), 413-433. https://doi.org/10.1111/jedm.12299
  • R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from https://www.R-project.org/
  • Rizopoulos, D. (2006). ltm: An R Package for latent variable modeling and item response analysis. Journal of Statistical Software, 17(5), 1–25. https://doi.org/10.18637/jss.v017.i05
  • Shu, Z., Henson, R., & Luecht, R. (2013). Using deterministic, gated item response theory model to detect test cheating due to item compromise. Psychometrika, 78(3), 481-497. https://doi.org/10.1007/s11336-012-9311-3
  • Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46-68. https://doi.org/10.3102/1076998616673872
  • Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory [Unpublished doctoral dissertation]. University of Florida.
  • Tan, Ş. (2001). Sınavlarda kopya çekmeyi önlemeye yönelik önlemler [Measures against cheating in exams]. Education and Science, 26(122), 32-40.
  • Wang, J., Tong, Y., Ling, M., Zhang, A., Hao, L., & Li, X. (2015). Analysis on test cheating and its solutions based on extenics and information technology. Procedia Computer Science, 55, 1009-1014. https://doi.org/10.1016/j.procs.2015.07.1024
  • Wang, T., Lee, W., Brennan, R. L., & Kolen, M. J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item nonequivalent groups design. Applied Psychological Measurement, 32, 632-651. https://doi.org/10.1177/0146621608314943
  • Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33. https://doi.org/10.18637/jss.v035.i12
  • Zimmermann, S., Klusmann, D., & Hampe, W. (2016). Are exam questions known in advance? Using local dependence to detect cheating. PloS One, 11(12). https://doi.org/10.1371/journal.pone.0167545
  • Zopluoglu, C. (2017). Similarity, answer copying, and aberrance. Understanding the status Quo. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 25–46). Routledge.
Year 2023, Volume: 14 Issue: 4, 455 - 471, 31.12.2023
https://doi.org/10.21031/epod.1199296

Abstract

Supporting Institution

TÜBİTAK

Project Number

2219 Yurt DIşı Doktora Sonrası Araştırma Burs Programı

References

  • American Educational Research Association, American Psychological Association and National Council on Measurement in Education (2014). Standards for educational and psychological testing.
  • Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
  • Barri, M. A. (2013). The impact anchor item exposure on mean/sigma linking And IRT true score equating under the neat design [Unpublished master’s thesis]. University of Kansas.
  • Belov, D. I. (2016). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83-97. https://doi.org/10.1177/0146621615603
  • Chen, D. F. (2021). Impact of item parameter drift on IRT linking methods [Unpublished doctoral thesis]. The University of North Carolina.
  • Cizek, G. (1999). Cheating on tests: how to do it, detect it, and prevent it. Mahwah, NJ: Lawrence Erlbaum.
  • Cizek, G. J., & Wollack, J. A. (Eds.). (2017). Handbook of quantitative methods for detecting cheating on tests. Routledge.
  • Cook, L. L., & Eignor, D. R. (1991). IRT equating methods. Educational measurement: Issues and practice, 10(3), 37-45. https://doi.org/10.1111/j.1745-3992.1991.tb00207.x
  • de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
  • Demir, M. K., & Arcagok, S. (2013). Sınıf öğretmeni adaylarının sınavlarda kopya çekilmesine ilişkin görüşlerinin değerlendirilmesi [Primary schoool teacher canditates’ opinions on cheating in exams]. Erzincan University Faculty of Eduction Journal, 15(1), 148-165. Retrieved from https://dergipark.org.tr/en/pub/erziefd/issue/6010/80121
  • Eckerly, C. A. (2017). Detecting preknowledge and item compromise. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 101-123). Routledge.
  • Fly, B. J. (1995). A study of ethical behaviour of students in graduate training programs in psychology [Unpublished doctoral thesis]. University of Denver.
  • Foster, D. (2013). Security issues in technology-based testing. In J. A. Wollack and J. J. Fremer , Eds., Handbook of test security (pp. 39–83). Routledge.
  • Gorney, K., & Wollack, J. A. (2022). Generating models for item preknowledge. Journal of Educational Measurement, 59(1), 22-42. https://doi.org/10.1111/jedm.12309.
  • Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121. https://doi.org/10.1207/s15324818ame1002_1
  • Harris, D. J. (1993, April). Practical issues in equating [Paper presentation]. American Educational Research Association, Atlanta, Georgia, USA.
  • Josephson Institute (2012). Josephson Institute’s 2012 report card on the ethics of American youth. Los Angeles, CA. Retrieved from http://charactercounts.org/programs/reportcard/2012/index.html.
  • Jurich, D. P. (2011). The impact of cheating on IRT equating under the non-equivalent anchor test design [Unpublished master’s thesis]. James Madison University.
  • Jurich, D. P., Goodman, J. T., & Becker, K. A. (2010). Assessment of various equating methods: Impact on the pass-fail status of cheaters and non-cheaters. In Poster presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.
  • Kane, M. T., & Mroch, A. A. (2020). Orthogonal Regression, the Cleary Criterion, and Lord's Paradox: Asking the Right Questions. ETS Research Report Series, 2020(1), 1-24. https://doi.org/10.1002/ets2.12298
  • Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. 3rd Edn. Springer
  • Liu, J., & Becker, K. (2022). The Impact of cheating on score comparability via pool‐based IRT pre‐equating. Journal of Educational Measurement, 59(2), 208-230. https://doi.org/10.1111/jedm.12321
  • Lee, S. Y. (2018). A mixture model approach to detect examinees with item preknowledge [Unpublished doctoral dissertation]. The University of Wisconsin-Madison.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Publishers.
  • Man, K., Harring, J. R., & Sinharay, S. (2019). Use of data mining methods to detect test fraud. Journal of Educational Measurement, 56(2), 251-279. https://doi.org/10.1111/jedm.12208
  • Qian, H., Staniewska, D., Reckase, M., & Woo, A. (2016). Using response time to detect item preknowledge in computer‐based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 38-47. https://doi.org/10.1111/emip.12102
  • Pan, Y., & Wollack, J. A. (2021). An unsupervised‐learning based approach to compromised items detection. Journal of Educational Measurement, 58(3), 413-433. https://doi.org/10.1111/jedm.12299
  • R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from https://www.R-project.org/
  • Rizopoulos, D. (2006). ltm: An R Package for latent variable modeling and item response analysis. Journal of Statistical Software, 17(5), 1–25. https://doi.org/10.18637/jss.v017.i05
  • Shu, Z., Henson, R., & Luecht, R. (2013). Using deterministic, gated item response theory model to detect test cheating due to item compromise. Psychometrika, 78(3), 481-497. https://doi.org/10.1007/s11336-012-9311-3
  • Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46-68. https://doi.org/10.3102/1076998616673872
  • Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory [Unpublished doctoral dissertation]. University of Florida.
  • Tan, Ş. (2001). Sınavlarda kopya çekmeyi önlemeye yönelik önlemler [Measures against cheating in exams]. Education and Science, 26(122), 32-40.
  • Wang, J., Tong, Y., Ling, M., Zhang, A., Hao, L., & Li, X. (2015). Analysis on test cheating and its solutions based on extenics and information technology. Procedia Computer Science, 55, 1009-1014. https://doi.org/10.1016/j.procs.2015.07.1024
  • Wang, T., Lee, W., Brennan, R. L., & Kolen, M. J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item nonequivalent groups design. Applied Psychological Measurement, 32, 632-651. https://doi.org/10.1177/0146621608314943
  • Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33. https://doi.org/10.18637/jss.v035.i12
  • Zimmermann, S., Klusmann, D., & Hampe, W. (2016). Are exam questions known in advance? Using local dependence to detect cheating. PloS One, 11(12). https://doi.org/10.1371/journal.pone.0167545
  • Zopluoglu, C. (2017). Similarity, answer copying, and aberrance. Understanding the status Quo. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 25–46). Routledge.
There are 38 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Çiğdem Akın Arıkan 0000-0001-5255-8792

Allan Cohen 0000-0002-8776-9378

Project Number 2219 Yurt DIşı Doktora Sonrası Araştırma Burs Programı
Publication Date December 31, 2023
Acceptance Date October 25, 2023
Published in Issue Year 2023 Volume: 14 Issue: 4

Cite

APA Akın Arıkan, Ç., & Cohen, A. (2023). The Impact of Item Preknowledge on Scaling and Equating: Item Response Theory True and Observed Score Equating Methods. Journal of Measurement and Evaluation in Education and Psychology, 14(4), 455-471. https://doi.org/10.21031/epod.1199296