Research Article
BibTex RIS Cite

Robustness of Computer Adaptive Tests to the Presence of Item Preknowledge: A Simulation Study

Year 2024, Volume: 15 Issue: 2, 138 - 147, 30.06.2024
https://doi.org/10.21031/epod.1470949

Abstract

Item preknowledge describes a scenario where candidates may have access to some of the test items prior to the test administration. This involves sharing test materials and/or answers and it is difficult to identify the individuals with item preknowledge or the shared materials of the test. Nevertheless, it is essential to investigate the ‘item preknowledge’ problem because it can significantly affect the validity of the test results. It is believed that traditional linear tests are more robust to this type of aberrant response behavior than adaptive tests. In this context, the aim of this study is to examine the effect of item preknowledge on computer adaptive tests and identify the conditions under which adaptive tests are most resistant to the item preknowledge. For this purpose, a Monte Carlo simulation study was performed and 28 different conditions were examined. The results of the study indicated that the EAP estimation method provided better measurement precision than ML over all conditions. When 2PL and 3PL IRT models were compared, it was observed that 2PL had higher precision at most of the conditions. However, when the aberrancy ratio increased and reached 20% for both individuals and items, 3PL outperformed the 2PL model and gave the best results with the EAP combination. The results were discussed in line with the literature on item preknowledge and CAT and implications for practitioners and further research were provided.

References

  • Ackerman T., Gierl M. J., Walker C. M. (2003). Using multidimensional item response theory to evaluate educational psychological tests. Educational Measurement Issues and Practice, 22(3), 37–51. https://doi.org/10.1111/j.1745-3992.2003.tb00136.x
  • Belov D. I. (2014). Detecting item preknowledge in computerized adaptive testing using information theory and combinatorial optimization. Journal of Computerized Adaptive Testing, 2(3), 37-58. https://doi.org/10.7333/jcat.v2i0.36
  • Belov, D. I. (2016). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83-97. https://doi.org/10.1177/0146621615603327
  • Clark, J. M. (2010). Aberrant response patterns as a multidimensional phenomenon: using factor-analytic model comparison to detect cheating. [Unpublished doctoral dissertation, University of Kansas]. ProQuest Dissertations and Theses Global.
  • Eckerly, C. A. (2017). Detecting item preknowledge and item compromise: Understanding the status quo. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 101-123). Routledge.
  • Guo, F. (2009). Quantifying the impact of compromised items in CAT. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. www.psych.umn.edu/psylabs/CATCentral/
  • Guo, J., Tay, L., & Drasgow, F. (2009). Conspiracies and test compromise: An evaluation of the resistance of test systems to small-scale cheating. International Journal of Testing, 9(4), 283–309. https://doi.org/10.1080/15305050903351901
  • Jia, B., Zhang, X., & Zhu, Z. (2019). A short note on aberrant responses bias in item response theory. Frontiers in Psychology, 10. https://www.frontiersin.org/articles/10.3389/fpsyg.2019.00043
  • Kim, S., & Moses, T. (2016). Investigating robustness of item response theory proficiency estimators to atypical response behaviors under two-stage multistage testing. ETS Research Report Series, 2016(2), 1–23. https://doi.org/10.1002/ets2.12111
  • Liu, T. , Sun, Y., Li, Z. & Xin, T. (2019) The impact of aberrant response on reliability and validity, measurement. Interdisciplinary Research and Perspectives, 17(3), 133-142, https://doi.org/10.1080/15366367.2019.1584848
  • Magis, D. (2014). On the asymptotic standard error of a class of robust estimators of ability in dichotomous item response models. British Journal of Mathematical and Statistical Psychology, 67(3), 430–450. https://doi.org/10.1111/bmsp.12027
  • Magis, D. , Raiche, G. & Barrada, J. R. (2022). catR: Generation of IRT response patterns under computerized adaptive testing (package version 3.17). https://cran.r-project.org/web/packages/catR
  • McLeod, L., Lewis, C., & Thissen, D. (2003). A Bayesian method for the detection of item preknowledge in computerized adaptive testing. Applied Psychological Measurement, 27(2), 121-137. https://doi.org/10.1177/0146621602250534
  • Meijer, R. R. (1996). Person-Fit research: An introduction. Applied Measurement in Education, 9, 3–8. https://doi.org/10.1207/s15324818ame0901_2
  • Meijer, R. & Sijtsma K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957
  • Pan, Y., Sinharay, S., Livne, O., & Wollack, J. A. (2022). A machine learning approach for detecting item compromise and preknowledge in computerized adaptive testing. Psychological Test and Assessment Modeling, 64(4), 385-424. https://doi.org/10.31234/osf.io/hk35a
  • Qian, H., Staniewska, D., Reckase, M., & Woo, A. (2016). Using response time to detect item preknowledge in computer‐based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 38-47. https://doi.org/10.1111/emip.12102
  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
  • Rios J. A., Guo H., Mao L., Liu O. L. (2017). Evaluating the impact of noneffortful responses on aggregated scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 74–104. https://doi.org/10.1080/15305058.2016.1231193
  • Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239–259. https://doi.org/10.1111/jedm.12046
  • Wan, S., & Keller, L. A. (2023). Using cumulative sum control chart to detect aberrant responses in educational assessments. Practical Assessment, Research and Evaluation, 28(2). https://doi.org/10.7275/pare.1257
  • Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing (Publication No. 10273809). [Doctoral Dissertation, Michigan State University]. ProQuest Dissertations & Theses.
  • Weiss, D.J., & Kingsbury, G.G. (1984). Application of computer adaptive testing to educational problems. Journal of Educational Measurement, 21 (4), 361-375. https://doi.org/10.111 1/j.1745-3984.1984.tb01040.x
  • Wollack, J. A., & Maynes, D. D. (2017). Detection of test collusion using cluster analysis. In Handbook of Quantitative Methods for Detecting Cheating on Tests (pp. 124-150). Routledge.
  • Yan, D., von Davier, A. A., & Lewis, C. (2014). Computerized multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized Multistage Testing. CRC Press.
  • Yen, Y. C., Ho, R. G., Laio, W.W., Chen, L.J., & Kuo, C. C. (2012). An empirical evaluation of the slip correction in the four parameter logistic models with computerized adaptive testing. Applied Psychological Measurement, 36(2), 75–87. https://doi.org/10.1177/0146621611432862
  • Yi, Q., Zhang, J., & Chang, H. H. (2008). Severity of organized item theft in computerized adaptive testing: A simulation study. Applied Psychological Measurement, 32(7), 543-558. https://doi.org/10.1177/0146621607311336
  • Zhang, J., Chang, HH. & Yi, Q. (2012). Comparing single-pool and multiple-pool designs regarding test security in computerized testing. Behavior Research Methods, 44, 742–752. https://doi.org/10.3758/s13428-011-0178-5
  • Zheng, Y., & Chang, H.H. (2014). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39 (2), 104-118. https://doi.org/10.1177/0146621614544519
Year 2024, Volume: 15 Issue: 2, 138 - 147, 30.06.2024
https://doi.org/10.21031/epod.1470949

Abstract

References

  • Ackerman T., Gierl M. J., Walker C. M. (2003). Using multidimensional item response theory to evaluate educational psychological tests. Educational Measurement Issues and Practice, 22(3), 37–51. https://doi.org/10.1111/j.1745-3992.2003.tb00136.x
  • Belov D. I. (2014). Detecting item preknowledge in computerized adaptive testing using information theory and combinatorial optimization. Journal of Computerized Adaptive Testing, 2(3), 37-58. https://doi.org/10.7333/jcat.v2i0.36
  • Belov, D. I. (2016). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83-97. https://doi.org/10.1177/0146621615603327
  • Clark, J. M. (2010). Aberrant response patterns as a multidimensional phenomenon: using factor-analytic model comparison to detect cheating. [Unpublished doctoral dissertation, University of Kansas]. ProQuest Dissertations and Theses Global.
  • Eckerly, C. A. (2017). Detecting item preknowledge and item compromise: Understanding the status quo. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 101-123). Routledge.
  • Guo, F. (2009). Quantifying the impact of compromised items in CAT. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. www.psych.umn.edu/psylabs/CATCentral/
  • Guo, J., Tay, L., & Drasgow, F. (2009). Conspiracies and test compromise: An evaluation of the resistance of test systems to small-scale cheating. International Journal of Testing, 9(4), 283–309. https://doi.org/10.1080/15305050903351901
  • Jia, B., Zhang, X., & Zhu, Z. (2019). A short note on aberrant responses bias in item response theory. Frontiers in Psychology, 10. https://www.frontiersin.org/articles/10.3389/fpsyg.2019.00043
  • Kim, S., & Moses, T. (2016). Investigating robustness of item response theory proficiency estimators to atypical response behaviors under two-stage multistage testing. ETS Research Report Series, 2016(2), 1–23. https://doi.org/10.1002/ets2.12111
  • Liu, T. , Sun, Y., Li, Z. & Xin, T. (2019) The impact of aberrant response on reliability and validity, measurement. Interdisciplinary Research and Perspectives, 17(3), 133-142, https://doi.org/10.1080/15366367.2019.1584848
  • Magis, D. (2014). On the asymptotic standard error of a class of robust estimators of ability in dichotomous item response models. British Journal of Mathematical and Statistical Psychology, 67(3), 430–450. https://doi.org/10.1111/bmsp.12027
  • Magis, D. , Raiche, G. & Barrada, J. R. (2022). catR: Generation of IRT response patterns under computerized adaptive testing (package version 3.17). https://cran.r-project.org/web/packages/catR
  • McLeod, L., Lewis, C., & Thissen, D. (2003). A Bayesian method for the detection of item preknowledge in computerized adaptive testing. Applied Psychological Measurement, 27(2), 121-137. https://doi.org/10.1177/0146621602250534
  • Meijer, R. R. (1996). Person-Fit research: An introduction. Applied Measurement in Education, 9, 3–8. https://doi.org/10.1207/s15324818ame0901_2
  • Meijer, R. & Sijtsma K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957
  • Pan, Y., Sinharay, S., Livne, O., & Wollack, J. A. (2022). A machine learning approach for detecting item compromise and preknowledge in computerized adaptive testing. Psychological Test and Assessment Modeling, 64(4), 385-424. https://doi.org/10.31234/osf.io/hk35a
  • Qian, H., Staniewska, D., Reckase, M., & Woo, A. (2016). Using response time to detect item preknowledge in computer‐based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 38-47. https://doi.org/10.1111/emip.12102
  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
  • Rios J. A., Guo H., Mao L., Liu O. L. (2017). Evaluating the impact of noneffortful responses on aggregated scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 74–104. https://doi.org/10.1080/15305058.2016.1231193
  • Tendeiro, J. N., & Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239–259. https://doi.org/10.1111/jedm.12046
  • Wan, S., & Keller, L. A. (2023). Using cumulative sum control chart to detect aberrant responses in educational assessments. Practical Assessment, Research and Evaluation, 28(2). https://doi.org/10.7275/pare.1257
  • Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing (Publication No. 10273809). [Doctoral Dissertation, Michigan State University]. ProQuest Dissertations & Theses.
  • Weiss, D.J., & Kingsbury, G.G. (1984). Application of computer adaptive testing to educational problems. Journal of Educational Measurement, 21 (4), 361-375. https://doi.org/10.111 1/j.1745-3984.1984.tb01040.x
  • Wollack, J. A., & Maynes, D. D. (2017). Detection of test collusion using cluster analysis. In Handbook of Quantitative Methods for Detecting Cheating on Tests (pp. 124-150). Routledge.
  • Yan, D., von Davier, A. A., & Lewis, C. (2014). Computerized multistage testing. In D. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized Multistage Testing. CRC Press.
  • Yen, Y. C., Ho, R. G., Laio, W.W., Chen, L.J., & Kuo, C. C. (2012). An empirical evaluation of the slip correction in the four parameter logistic models with computerized adaptive testing. Applied Psychological Measurement, 36(2), 75–87. https://doi.org/10.1177/0146621611432862
  • Yi, Q., Zhang, J., & Chang, H. H. (2008). Severity of organized item theft in computerized adaptive testing: A simulation study. Applied Psychological Measurement, 32(7), 543-558. https://doi.org/10.1177/0146621607311336
  • Zhang, J., Chang, HH. & Yi, Q. (2012). Comparing single-pool and multiple-pool designs regarding test security in computerized testing. Behavior Research Methods, 44, 742–752. https://doi.org/10.3758/s13428-011-0178-5
  • Zheng, Y., & Chang, H.H. (2014). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39 (2), 104-118. https://doi.org/10.1177/0146621614544519
There are 29 citations in total.

Details

Primary Language English
Subjects Testing, Assessment and Psychometrics (Other)
Journal Section Articles
Authors

Hakan Kara 0000-0002-2396-3462

Nuri Doğan 0000-0001-6274-2016

Başak Erdem Kara 0000-0003-3066-2892

Publication Date June 30, 2024
Submission Date April 19, 2024
Acceptance Date June 6, 2024
Published in Issue Year 2024 Volume: 15 Issue: 2

Cite

APA Kara, H., Doğan, N., & Erdem Kara, B. (2024). Robustness of Computer Adaptive Tests to the Presence of Item Preknowledge: A Simulation Study. Journal of Measurement and Evaluation in Education and Psychology, 15(2), 138-147. https://doi.org/10.21031/epod.1470949