The Impact of Item Preknowledge on Scaling and Equating: Item Response Theory True and Observed Score Equating Methods

Çiğdem Akın Arıkan; Allan Cohen

doi:10.21031/epod.1199296

Araştırma Makalesi

The Impact of Item Preknowledge on Scaling and Equating: Item Response Theory True and Observed Score Equating Methods

Yıl 2023, Cilt: 14 Sayı: 4, 455 - 471, 31.12.2023

Çiğdem Akın Arıkan Allan Cohen

https://doi.org/10.21031/epod.1199296

Öz

Testing programs often reuse items due mainly to the difficulty and expense of creating new items. This poses potential problems to test item security if some or all test-takers have knowledge of the items prior to taking the test. Simulated data are used to assess the effect of preknowledge on item response theory true and observed score equating. Root mean square error and bias were used for the recovery of equated scores and linking coefficients for two scaling methods. Results of this study indicated that item preknowledge has a large effect on equated scores and linking coefficients. Furthermore, as the mean ability distribution of the group difference, the number of exposed items, and the number of examinees with item preknowledge increase, the bias and RMSE for equated scores and linking coefficients also increase. Additionally, IRT true score equating results in a larger bias and RMSE than does IRT observed score equating. These findings suggest that item preknowledge has the potential to inflate equated scores, putting the validity of the test scores at risk.

Anahtar Kelimeler

cheating, item preknowledge, test equating, RMSE, bias

Proje Numarası

2219 Yurt DIşı Doktora Sonrası Araştırma Burs Programı

Kaynakça

American Educational Research Association, American Psychological Association and National Council on Measurement in Education (2014). Standards for educational and psychological testing.
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
Barri, M. A. (2013). The impact anchor item exposure on mean/sigma linking And IRT true score equating under the neat design [Unpublished master’s thesis]. University of Kansas.
Belov, D. I. (2016). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83-97. https://doi.org/10.1177/0146621615603
Chen, D. F. (2021). Impact of item parameter drift on IRT linking methods [Unpublished doctoral thesis]. The University of North Carolina.
Cizek, G. (1999). Cheating on tests: how to do it, detect it, and prevent it. Mahwah, NJ: Lawrence Erlbaum.
Cizek, G. J., & Wollack, J. A. (Eds.). (2017). Handbook of quantitative methods for detecting cheating on tests. Routledge.
Cook, L. L., & Eignor, D. R. (1991). IRT equating methods. Educational measurement: Issues and practice, 10(3), 37-45. https://doi.org/10.1111/j.1745-3992.1991.tb00207.x
de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
Demir, M. K., & Arcagok, S. (2013). Sınıf öğretmeni adaylarının sınavlarda kopya çekilmesine ilişkin görüşlerinin değerlendirilmesi [Primary schoool teacher canditates’ opinions on cheating in exams]. Erzincan University Faculty of Eduction Journal, 15(1), 148-165. Retrieved from https://dergipark.org.tr/en/pub/erziefd/issue/6010/80121
Eckerly, C. A. (2017). Detecting preknowledge and item compromise. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 101-123). Routledge.
Fly, B. J. (1995). A study of ethical behaviour of students in graduate training programs in psychology [Unpublished doctoral thesis]. University of Denver.
Foster, D. (2013). Security issues in technology-based testing. In J. A. Wollack and J. J. Fremer , Eds., Handbook of test security (pp. 39–83). Routledge.
Gorney, K., & Wollack, J. A. (2022). Generating models for item preknowledge. Journal of Educational Measurement, 59(1), 22-42. https://doi.org/10.1111/jedm.12309.
Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121. https://doi.org/10.1207/s15324818ame1002_1
Harris, D. J. (1993, April). Practical issues in equating [Paper presentation]. American Educational Research Association, Atlanta, Georgia, USA.
Josephson Institute (2012). Josephson Institute’s 2012 report card on the ethics of American youth. Los Angeles, CA. Retrieved from http://charactercounts.org/programs/reportcard/2012/index.html.
Jurich, D. P. (2011). The impact of cheating on IRT equating under the non-equivalent anchor test design [Unpublished master’s thesis]. James Madison University.
Jurich, D. P., Goodman, J. T., & Becker, K. A. (2010). Assessment of various equating methods: Impact on the pass-fail status of cheaters and non-cheaters. In Poster presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.
Kane, M. T., & Mroch, A. A. (2020). Orthogonal Regression, the Cleary Criterion, and Lord's Paradox: Asking the Right Questions. ETS Research Report Series, 2020(1), 1-24. https://doi.org/10.1002/ets2.12298
Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. 3rd Edn. Springer
Liu, J., & Becker, K. (2022). The Impact of cheating on score comparability via pool‐based IRT pre‐equating. Journal of Educational Measurement, 59(2), 208-230. https://doi.org/10.1111/jedm.12321
Lee, S. Y. (2018). A mixture model approach to detect examinees with item preknowledge [Unpublished doctoral dissertation]. The University of Wisconsin-Madison.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Publishers.
Man, K., Harring, J. R., & Sinharay, S. (2019). Use of data mining methods to detect test fraud. Journal of Educational Measurement, 56(2), 251-279. https://doi.org/10.1111/jedm.12208
Qian, H., Staniewska, D., Reckase, M., & Woo, A. (2016). Using response time to detect item preknowledge in computer‐based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 38-47. https://doi.org/10.1111/emip.12102
Pan, Y., & Wollack, J. A. (2021). An unsupervised‐learning based approach to compromised items detection. Journal of Educational Measurement, 58(3), 413-433. https://doi.org/10.1111/jedm.12299
R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from https://www.R-project.org/
Rizopoulos, D. (2006). ltm: An R Package for latent variable modeling and item response analysis. Journal of Statistical Software, 17(5), 1–25. https://doi.org/10.18637/jss.v017.i05
Shu, Z., Henson, R., & Luecht, R. (2013). Using deterministic, gated item response theory model to detect test cheating due to item compromise. Psychometrika, 78(3), 481-497. https://doi.org/10.1007/s11336-012-9311-3
Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46-68. https://doi.org/10.3102/1076998616673872
Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory [Unpublished doctoral dissertation]. University of Florida.
Tan, Ş. (2001). Sınavlarda kopya çekmeyi önlemeye yönelik önlemler [Measures against cheating in exams]. Education and Science, 26(122), 32-40.
Wang, J., Tong, Y., Ling, M., Zhang, A., Hao, L., & Li, X. (2015). Analysis on test cheating and its solutions based on extenics and information technology. Procedia Computer Science, 55, 1009-1014. https://doi.org/10.1016/j.procs.2015.07.1024
Wang, T., Lee, W., Brennan, R. L., & Kolen, M. J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item nonequivalent groups design. Applied Psychological Measurement, 32, 632-651. https://doi.org/10.1177/0146621608314943
Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33. https://doi.org/10.18637/jss.v035.i12
Zimmermann, S., Klusmann, D., & Hampe, W. (2016). Are exam questions known in advance? Using local dependence to detect cheating. PloS One, 11(12). https://doi.org/10.1371/journal.pone.0167545
Zopluoglu, C. (2017). Similarity, answer copying, and aberrance. Understanding the status Quo. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 25–46). Routledge.

Yıl 2023, Cilt: 14 Sayı: 4, 455 - 471, 31.12.2023

Çiğdem Akın Arıkan Allan Cohen

https://doi.org/10.21031/epod.1199296

Öz

Destekleyen Kurum

TÜBİTAK

Proje Numarası

2219 Yurt DIşı Doktora Sonrası Araştırma Burs Programı

Kaynakça

American Educational Research Association, American Psychological Association and National Council on Measurement in Education (2014). Standards for educational and psychological testing.
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
Barri, M. A. (2013). The impact anchor item exposure on mean/sigma linking And IRT true score equating under the neat design [Unpublished master’s thesis]. University of Kansas.
Belov, D. I. (2016). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83-97. https://doi.org/10.1177/0146621615603
Chen, D. F. (2021). Impact of item parameter drift on IRT linking methods [Unpublished doctoral thesis]. The University of North Carolina.
Cizek, G. (1999). Cheating on tests: how to do it, detect it, and prevent it. Mahwah, NJ: Lawrence Erlbaum.
Cizek, G. J., & Wollack, J. A. (Eds.). (2017). Handbook of quantitative methods for detecting cheating on tests. Routledge.
Cook, L. L., & Eignor, D. R. (1991). IRT equating methods. Educational measurement: Issues and practice, 10(3), 37-45. https://doi.org/10.1111/j.1745-3992.1991.tb00207.x
de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
Demir, M. K., & Arcagok, S. (2013). Sınıf öğretmeni adaylarının sınavlarda kopya çekilmesine ilişkin görüşlerinin değerlendirilmesi [Primary schoool teacher canditates’ opinions on cheating in exams]. Erzincan University Faculty of Eduction Journal, 15(1), 148-165. Retrieved from https://dergipark.org.tr/en/pub/erziefd/issue/6010/80121
Eckerly, C. A. (2017). Detecting preknowledge and item compromise. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 101-123). Routledge.
Fly, B. J. (1995). A study of ethical behaviour of students in graduate training programs in psychology [Unpublished doctoral thesis]. University of Denver.
Foster, D. (2013). Security issues in technology-based testing. In J. A. Wollack and J. J. Fremer , Eds., Handbook of test security (pp. 39–83). Routledge.
Gorney, K., & Wollack, J. A. (2022). Generating models for item preknowledge. Journal of Educational Measurement, 59(1), 22-42. https://doi.org/10.1111/jedm.12309.
Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121. https://doi.org/10.1207/s15324818ame1002_1
Harris, D. J. (1993, April). Practical issues in equating [Paper presentation]. American Educational Research Association, Atlanta, Georgia, USA.
Josephson Institute (2012). Josephson Institute’s 2012 report card on the ethics of American youth. Los Angeles, CA. Retrieved from http://charactercounts.org/programs/reportcard/2012/index.html.
Jurich, D. P. (2011). The impact of cheating on IRT equating under the non-equivalent anchor test design [Unpublished master’s thesis]. James Madison University.
Jurich, D. P., Goodman, J. T., & Becker, K. A. (2010). Assessment of various equating methods: Impact on the pass-fail status of cheaters and non-cheaters. In Poster presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.
Kane, M. T., & Mroch, A. A. (2020). Orthogonal Regression, the Cleary Criterion, and Lord's Paradox: Asking the Right Questions. ETS Research Report Series, 2020(1), 1-24. https://doi.org/10.1002/ets2.12298
Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. 3rd Edn. Springer
Liu, J., & Becker, K. (2022). The Impact of cheating on score comparability via pool‐based IRT pre‐equating. Journal of Educational Measurement, 59(2), 208-230. https://doi.org/10.1111/jedm.12321
Lee, S. Y. (2018). A mixture model approach to detect examinees with item preknowledge [Unpublished doctoral dissertation]. The University of Wisconsin-Madison.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Publishers.
Man, K., Harring, J. R., & Sinharay, S. (2019). Use of data mining methods to detect test fraud. Journal of Educational Measurement, 56(2), 251-279. https://doi.org/10.1111/jedm.12208
Qian, H., Staniewska, D., Reckase, M., & Woo, A. (2016). Using response time to detect item preknowledge in computer‐based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 38-47. https://doi.org/10.1111/emip.12102
Pan, Y., & Wollack, J. A. (2021). An unsupervised‐learning based approach to compromised items detection. Journal of Educational Measurement, 58(3), 413-433. https://doi.org/10.1111/jedm.12299
R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria. Retrieved from https://www.R-project.org/
Rizopoulos, D. (2006). ltm: An R Package for latent variable modeling and item response analysis. Journal of Statistical Software, 17(5), 1–25. https://doi.org/10.18637/jss.v017.i05
Shu, Z., Henson, R., & Luecht, R. (2013). Using deterministic, gated item response theory model to detect test cheating due to item compromise. Psychometrika, 78(3), 481-497. https://doi.org/10.1007/s11336-012-9311-3
Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46-68. https://doi.org/10.3102/1076998616673872
Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory [Unpublished doctoral dissertation]. University of Florida.
Tan, Ş. (2001). Sınavlarda kopya çekmeyi önlemeye yönelik önlemler [Measures against cheating in exams]. Education and Science, 26(122), 32-40.
Wang, J., Tong, Y., Ling, M., Zhang, A., Hao, L., & Li, X. (2015). Analysis on test cheating and its solutions based on extenics and information technology. Procedia Computer Science, 55, 1009-1014. https://doi.org/10.1016/j.procs.2015.07.1024
Wang, T., Lee, W., Brennan, R. L., & Kolen, M. J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item nonequivalent groups design. Applied Psychological Measurement, 32, 632-651. https://doi.org/10.1177/0146621608314943
Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33. https://doi.org/10.18637/jss.v035.i12
Zimmermann, S., Klusmann, D., & Hampe, W. (2016). Are exam questions known in advance? Using local dependence to detect cheating. PloS One, 11(12). https://doi.org/10.1371/journal.pone.0167545
Zopluoglu, C. (2017). Similarity, answer copying, and aberrance. Understanding the status Quo. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 25–46). Routledge.

Toplam 38 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Bölüm	Makaleler
Yazarlar	Çiğdem Akın Arıkan 0000-0001-5255-8792 Allan Cohen 0000-0002-8776-9378
Proje Numarası	2219 Yurt DIşı Doktora Sonrası Araştırma Burs Programı
Yayımlanma Tarihi	31 Aralık 2023
Kabul Tarihi	25 Ekim 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 14 Sayı: 4

Kaynak Göster

APA	Akın Arıkan, Ç., & Cohen, A. (2023). The Impact of Item Preknowledge on Scaling and Equating: Item Response Theory True and Observed Score Equating Methods. Journal of Measurement and Evaluation in Education and Psychology, 14(4), 455-471. https://doi.org/10.21031/epod.1199296

Kapak Resmi İndir

Makale Dosyaları

Tam Metin