Research Article
BibTex RIS Cite

Detecting suspicious persons in multiple-choice tests: Comparison of performance of 𝜔 and GBT indexes

Year 2026, Volume: 13 Issue: 1, 95 - 107, 02.01.2026
https://doi.org/10.21449/ijate.1550949

Abstract

The COVID-19 pandemic has led to a widespread shift from traditional, supervised exams to unsupervised online testing environments, which has increased opportunities and motivation for cheating. Such dishonest behaviors threaten the validity and fairness of test results, underscoring the critical importance of robust test security measures. This study focuses on detecting individuals suspected of cheating by using two widely recognized statistical indexes: the 𝜔 and Generalized Binomial Test (GBT) indexes. Both indexes were applied within two analytical frameworks no-stage and two-stage methods. In the two-stage approach, the 𝜔 and GBT indexes were employed following the detection of potential cheaters using the Kullback-Leibler (KL) divergence index and person fit statistics (lz and lz*). To simulate realistic conditions, we manipulated key variables including test difficulty, ability levels of suspected copiers, and the proportion of copied items. Our findings demonstrate that the GBT index combined with the KL index consistently outperformed other methods across varying scenarios in terms of detection accuracy and control of false positives. These results suggest that integrating person-fit and distribution-based methods enhances the reliability of cheating detection in unsupervised testing environments. The study provides valuable insights for test administrators seeking effective statistical tools to safeguard test integrity, especially in the context of increasing online assessments.

Ethical Statement

Ankara University, 30.10.2020-134.

References

  • Armstrong, R., & Shi, M. (2009). Model-free CUSUM methods for person fit. Journal of Educational Measurement, 46, 408-428. https://doi.org/10.1111/j.1745-3984.2009.00090.x
  • Armstrong R.D., Stoumbos, Z.G., Kung, M.T., & Shi, M. (2007). On the performance of the lz person fit statistic. Practical Assessment Research & Evaluation, 12. https://doi.org/10.7275/xz5d-7j62
  • Balta, E., & Dogan, C.D. (2024). Investigation of preknowledge cheating via joint hierarchical modeling patterns of response accuracy and response time. SAGE Open, 14(4), 1 15. https://doi.org/10.1177/21582440241297946
  • Belov, D.I., Pashley, P.J., Lewis, C., & Armstrong, R.D. (2007). Detecting aberrant responses with Kullback–Leibler distance. In K. Shigemasu, A. Okada, T. Imaizumi & T. Hoshino (Eds.), New trends in psychometrics (pp. 7-14). Universal Academy Press.
  • Belov, D.I., & Armstrong, R.D. (2010). Automatic detection of answer copying via Kullback-Leibler divergence and K index. Applied Psychological Measurement, 34, 379 392. https://doi.org/10.1177/0146621610370453
  • Belov, D.I. (2013). Detection of test collusion via Kullback–Leibler divergence. Journal of Educational Measurement, 50, 141-163. https://doi.org/10.1111/jedm.12008
  • Belov, D. (2014a). Detection of aberrant answer changes via Kullback Leibler divergence. (Report No. RR-14-04). LSAC.
  • Belov, D.I. (2014b). Detecting item preknowledge in computerized adaptive testing using information theory and combinatorial optimization. Journal of Computerized Adaptive Testing, 2(3), 37-58.
  • Belov, D. (2016). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83 9. https://doi.org/10.1177/0146621615603327
  • Cizek, G., & Wollack, J. (2017). Exploring cheating on tests – the context, the concern, and the challenges. In G. Cizek & J. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp 3-19). Routledge.
  • Hauser, C., Kingsbury, G.G., & Houser, R.L. (2011, April 9-11). Individual score validity: Using the wariness index to identify test performance to treat with caution. [Paper presentation]. National Council on Measurement in Education Annual Meeting, New Orleans, LA, USA.
  • He, Q., Meadows, M., & Black, B. (2018). Statistical techniques for studying anomaly in test results: A review of literature. The Office of Qualifications and Examinations Regulation. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/690007/Statistical_techniques_for_studying_anomaly_in_test_results-_a_review_of_literature.pdf
  • Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277 298. https://doi.org/10.1207/S15324818AME1604_2
  • Krimpen Stoop, E.M.L.A., & Meijer, R.R. (2001). CUSUM based person fit statistics for adaptive testing. Journal of Educational and Behavioral Statistics, 26(2), 199 217. https://doi.org/10.3102/10769986026002199
  • Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79-86. http://mathfaculty.fullerton.edu/sbehseta/Kullback.pdf
  • Levine, M.V., & Rubin, D.B. (1979). Measuring the appropriateness of multiple choice test scores. Journal of Educational Statistics 4(4), 269-290. https://doi.org/10.2307/1164595
  • Magis, D., Raîche, G., & Be´land, S. (2012). A didactic presentation of Snijders’s lz* index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 37(1), 57-81. https://doi.org/10.3102/1076998610396894
  • Measurement, Selection and Placement Center (ÖSYM). (2010, September 17). Announcement of cancellation of the 2010 KPSS undergraduate educational sciences test. https://www.osym.gov.tr/TR,3125/2010-kpss-lisans-egitim-bilimleri-testinin-iptali-17092010.html
  • Meijer, R.R. (1994). The number of Guttman errors as a simple and powerful person-fit statistic. Applied Psychological Measurement, 18(4), 311-314. https://doi.org/10.1177/014662169401800402
  • Meijer, R.R. (1996). Person-Fit research: An introduction. Applied Measurement in Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2
  • Meijer, R.R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107-135. https://doi.org/10.1177/01466210122031957
  • Meijer, R.R., & Sotaridona, L. S. (2006). Detection of advance item knowledge using response times in computer adaptive testing. Law School Admission Council. https://ris.utwente.nl/ws/portalfiles/portal/5129730/LSAC_CT-03-03.pdf
  • Meijer, R.R., & Tendeiro, J.N. (2014). The use of person-fit scores in high stakes educational testing: How to use them and what they tell us. Law School Admission Council.
  • Molenaar, I.W., & Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55, 75-106. https://doi.org/10.1007/BF02294745
  • Schnipke, D.L., & Scrams, D.J. (1999). Exploring issues of test taker behavior: Insights gained from response-time analyses. Law School Admission Council.
  • Seol, J., & Rubright, J.D. (2013). The impact of test characteristics on Kullback-Leibler divergence index to identify examinees with aberrant responses [Paper presentation]. The 2𝑛𝑑 Annual Conference on Statistical Detection of Potential Test Fraud, Madison, WI, USA.
  • Shu, Z. (2010). Detecting test cheating using a deterministic, gated item response theory model [Unpublished doctoral dissertation]. The University of North Carolina.
  • Sideridis, G.D., & Zopluoglu, C. (2018). Validation of response similarity analysis for the detection of academic cheating: An experimental study. Journal of Applied Measurement, 19(1), 59 75.
  • Singmann, H. (2020). Complete environment for bayesian inference (LaplaceDemon) (Version 16.1.4) [Computer software manual]. https://cran.r project.org/web/packages/LaplacesDemon/LaplacesDemon.pdf
  • Sinharay, S., & Johnson, M.S. (2020). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology, 73(3), 397-419. https://doi.org/10.1111/bmsp.12187
  • Sotaridona, L.S., & Meijer, R.R. (2002). Statistical properties of the K-index for detecting answer copying in a multiplechoice test. Journal of Educational Measurement, 39(2), 115 132. https://www.jstor.org/stable/1435251
  • Sotaridona, L.S., & Meijer, R.R. (2003). Two new statistics to detect answer copying. Journal of Educational Measurement, 40(1), 53-70. https://www.jstor.org/stable/1435054
  • Sunbul O., & Yormaz, S. (2018a). Effects of test level discrimination and difficulty on answer copying indices. International Journal of Evaluation and Research in Education, 7(1), 32 38. http://doi.org/10.11591/ijere.v7i1.11488
  • Sunbul, O., & Yormaz, S. (2018b). Investigating the performance of omega index according to item parameters and ability levels. Eurasian Journal of Educational Research, 74, 207 226. https://doi.org/10.14689/ejer.2018.74.11
  • Sahin, A. (2012). Madde Tepki Kuramı'nda test uzunluğuve örneklem büyüklüğünün model veri uyumu, madde parametreleri ve standart hata değerlerine etkisinin incelenmesi [An investigation on the effects of test length and sample size in item response theory on model-data fit, item parameters and standard error values] [Unpublished doctoral dissertation]. Hacettepe University.
  • Raton, M. (2014). OptimalCutpoints: An R package for selecting optimal cutpoints in diagnostic tests. Journal of Statistical Software, 61(8), 1-36. https://doi.org/10.18637/jss.v061.i08
  • Tendeiro, J.N. (2018 Person Fit (PerFit) (Version 1.4.3) [Computer software manual]. https://cran.rproject.org/web/packages/PerFit/PerFit.pdf
  • Tendeiro, N.J., & Meijer, R.R. (2012). A CUSUM to detect person misfit: a discussion and some alternatives for existing procedures, Applied Psychological Measurement 36(5), 420 442. https://doi.org/10.1177/0146621612446305
  • Tendeiro, N.J., & Meijer, R.R. (2014). Detection of invalid test scores: the usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239 259. https://doi.org/10.1111/jedm.12046
  • Thiessen, B. (2008). Relationship between test security policies and test score manipulations. [Unpublished doctoral dissertation]. University of Iowa.
  • van der Linden, W.J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181-204. https://doi.org/10.3102/10769986031002181
  • van der Linden, W.J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287-308. https://doi.org/10.1007/s11336-006-1478-z
  • van der Linden, W.J. (2009). A bivariate log-normal response-time model for the detection of collusion between test takers. Journal of Educational and Behavioral Statistics, 34(3), 378 394. https://doi.org/10.3102/1076998609332107
  • van der Linden, W.J., & Guo, F.M. (2008). Bayesian procedures for identifying aberrant response time patterns in adaptive testing. Psychometrika, 73(3), 365 384. https://doi.org/10.1007/s11336 007 9046-8
  • van der Linden, W.J., & van Krimpen-Stoop, E.M. (2003). Using response times to detect aberrant responses in computerized adaptive testing. Psychometrika, 68(2), 251 265. https://doi.org/10.1007/BF02294800
  • Voncken, L. (2014). Comparison of the lz* person-fit index and ω copying-index in copying detection. [First Year Paper]. Tilburg University. http://arno.uvt.nl/show.cgi?fid=135361
  • Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456 477. https://doi.org/10.1111/bmsp.12054
  • Wise, S.L., & DeMars, C.E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19 38. https://doi.org/10.1111/j.1745 3984.2006.00002.x
  • Wise, S.L., & Kong, X. (2005). Response time effort: a new measure of examinee motivation in computer based tests. Applied Measurement in Education, 18(2), 163 183. https://doi.org/10.1207/s15324818ame1802_2
  • Wise, S.L., Ma, L., Kingsbury, G.G., & Hauser, C. (2010, May 1-3). An investigation of the relationship between time of testing and test-taking effort [Paper presentation]. National Council on Measurement in Education Annual Meeting, Denver, CO ,USA.
  • Wollack, J.A. (1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21(4), 307-320. https://doi.org/10.1177/01466216970214002
  • Wollack, J.A. (2003). Comparison of answer copying indices with real data. Journal of Educational Measurement, 40(3), 189-205. https://doi.org/10.1111/j.1745-3984.2003.tb01104.x
  • Wollack, J.A. (2006). Simultaneous use of multiple answer copying indexes to improve detection rates. Applied Measurement in Education, 19(4), 265-288. https://doi.org/10.1207/s15324818ame1904_3
  • Wollack, J.A., & Cohen, A.S. (1998). Detection of answer copying with unknown item and trait parameters. Applied Psychological Measurement, 22(2), 144 152. https://doi.org/10.1177/01466216980222004
  • Wollack, J.A., & Maynes, D. (2011, April 9-11). Detection of test collusion using item response data [Paper presentation]. National Council on Measurement in Education Annual Meeting, New Orleans, LA, USA.
  • Yormaz, S., & Sunbul, O. (2017). Determination of type I error rates and power of answer copying ındices under various conditions. Educatıonal Sciences: Theory & Practıce, 17(1), 5 26. https://doi.org/10.12738/estp.2017.1.0105
  • Yormaz, S. (2019). Test güvenliği açısından bireyler arasındaki olası iş birliğinin incelenmesi [Investigation of possible collusion between examinees in terms of test security] [Unpublished doctoral dissertation]. Mersin University.
  • Zhan, P., Jiao, H., Man, K., Wang, W.C., & He, K. (2021). Variable speed across dimensions of ability in the joint model for responses and response times. Frontiers in Psychology, 12, Article 469196. https://doi.org/10.3389/fpsyg.2021.469196
  • Zopluoglu, C. (2016). Classification performance of answer-copying indices under different types of IRT models. Applied Psychological Measurement 40(8), 592 607. https://doi.org/10.1177/0146621616664724
  • Zopluoglu, C. (2017). Similarity, answer copying, and aberrance: Understanding the status quo. In G. Cizek & J. Wollack (Eds.), Handbook of Detecting Cheating on Tests (pp. 44-72). Routledge.
  • Zopluoglu, C., & Davenport, E.C., Jr. (2012). The empirical power and type I error rates of the GBT and ω indices in detecting answer copying on multiple-choice tests. Educational and Psychological Measurement, 72(6), 975-1000. https://doi.org/10.1177/0013164412442941
  • Zopluoglu, C. (2018). Computing response similarity indices for multiple-choice tests (CopyDetect) (Version 1.3) [Computer software manual]. https://cran.r project.org/web/packages/CopyDetect/index.html

Detecting suspicious persons in multiple-choice tests: Comparison of performance of 𝜔 and GBT indexes

Year 2026, Volume: 13 Issue: 1, 95 - 107, 02.01.2026
https://doi.org/10.21449/ijate.1550949

Abstract

The COVID-19 pandemic has led to a widespread shift from traditional, supervised exams to unsupervised online testing environments, which has increased opportunities and motivation for cheating. Such dishonest behaviors threaten the validity and fairness of test results, underscoring the critical importance of robust test security measures. This study focuses on detecting individuals suspected of cheating by using two widely recognized statistical indexes: the 𝜔 and Generalized Binomial Test (GBT) indexes. Both indexes were applied within two analytical frameworks no-stage and two-stage methods. In the two-stage approach, the 𝜔 and GBT indexes were employed following the detection of potential cheaters using the Kullback-Leibler (KL) divergence index and person fit statistics (lz and lz*). To simulate realistic conditions, we manipulated key variables including test difficulty, ability levels of suspected copiers, and the proportion of copied items. Our findings demonstrate that the GBT index combined with the KL index consistently outperformed other methods across varying scenarios in terms of detection accuracy and control of false positives. These results suggest that integrating person-fit and distribution-based methods enhances the reliability of cheating detection in unsupervised testing environments. The study provides valuable insights for test administrators seeking effective statistical tools to safeguard test integrity, especially in the context of increasing online assessments.

Ethical Statement

Ankara University, 30.10.2020-134.

References

  • Armstrong, R., & Shi, M. (2009). Model-free CUSUM methods for person fit. Journal of Educational Measurement, 46, 408-428. https://doi.org/10.1111/j.1745-3984.2009.00090.x
  • Armstrong R.D., Stoumbos, Z.G., Kung, M.T., & Shi, M. (2007). On the performance of the lz person fit statistic. Practical Assessment Research & Evaluation, 12. https://doi.org/10.7275/xz5d-7j62
  • Balta, E., & Dogan, C.D. (2024). Investigation of preknowledge cheating via joint hierarchical modeling patterns of response accuracy and response time. SAGE Open, 14(4), 1 15. https://doi.org/10.1177/21582440241297946
  • Belov, D.I., Pashley, P.J., Lewis, C., & Armstrong, R.D. (2007). Detecting aberrant responses with Kullback–Leibler distance. In K. Shigemasu, A. Okada, T. Imaizumi & T. Hoshino (Eds.), New trends in psychometrics (pp. 7-14). Universal Academy Press.
  • Belov, D.I., & Armstrong, R.D. (2010). Automatic detection of answer copying via Kullback-Leibler divergence and K index. Applied Psychological Measurement, 34, 379 392. https://doi.org/10.1177/0146621610370453
  • Belov, D.I. (2013). Detection of test collusion via Kullback–Leibler divergence. Journal of Educational Measurement, 50, 141-163. https://doi.org/10.1111/jedm.12008
  • Belov, D. (2014a). Detection of aberrant answer changes via Kullback Leibler divergence. (Report No. RR-14-04). LSAC.
  • Belov, D.I. (2014b). Detecting item preknowledge in computerized adaptive testing using information theory and combinatorial optimization. Journal of Computerized Adaptive Testing, 2(3), 37-58.
  • Belov, D. (2016). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83 9. https://doi.org/10.1177/0146621615603327
  • Cizek, G., & Wollack, J. (2017). Exploring cheating on tests – the context, the concern, and the challenges. In G. Cizek & J. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp 3-19). Routledge.
  • Hauser, C., Kingsbury, G.G., & Houser, R.L. (2011, April 9-11). Individual score validity: Using the wariness index to identify test performance to treat with caution. [Paper presentation]. National Council on Measurement in Education Annual Meeting, New Orleans, LA, USA.
  • He, Q., Meadows, M., & Black, B. (2018). Statistical techniques for studying anomaly in test results: A review of literature. The Office of Qualifications and Examinations Regulation. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/690007/Statistical_techniques_for_studying_anomaly_in_test_results-_a_review_of_literature.pdf
  • Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277 298. https://doi.org/10.1207/S15324818AME1604_2
  • Krimpen Stoop, E.M.L.A., & Meijer, R.R. (2001). CUSUM based person fit statistics for adaptive testing. Journal of Educational and Behavioral Statistics, 26(2), 199 217. https://doi.org/10.3102/10769986026002199
  • Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79-86. http://mathfaculty.fullerton.edu/sbehseta/Kullback.pdf
  • Levine, M.V., & Rubin, D.B. (1979). Measuring the appropriateness of multiple choice test scores. Journal of Educational Statistics 4(4), 269-290. https://doi.org/10.2307/1164595
  • Magis, D., Raîche, G., & Be´land, S. (2012). A didactic presentation of Snijders’s lz* index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 37(1), 57-81. https://doi.org/10.3102/1076998610396894
  • Measurement, Selection and Placement Center (ÖSYM). (2010, September 17). Announcement of cancellation of the 2010 KPSS undergraduate educational sciences test. https://www.osym.gov.tr/TR,3125/2010-kpss-lisans-egitim-bilimleri-testinin-iptali-17092010.html
  • Meijer, R.R. (1994). The number of Guttman errors as a simple and powerful person-fit statistic. Applied Psychological Measurement, 18(4), 311-314. https://doi.org/10.1177/014662169401800402
  • Meijer, R.R. (1996). Person-Fit research: An introduction. Applied Measurement in Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2
  • Meijer, R.R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107-135. https://doi.org/10.1177/01466210122031957
  • Meijer, R.R., & Sotaridona, L. S. (2006). Detection of advance item knowledge using response times in computer adaptive testing. Law School Admission Council. https://ris.utwente.nl/ws/portalfiles/portal/5129730/LSAC_CT-03-03.pdf
  • Meijer, R.R., & Tendeiro, J.N. (2014). The use of person-fit scores in high stakes educational testing: How to use them and what they tell us. Law School Admission Council.
  • Molenaar, I.W., & Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55, 75-106. https://doi.org/10.1007/BF02294745
  • Schnipke, D.L., & Scrams, D.J. (1999). Exploring issues of test taker behavior: Insights gained from response-time analyses. Law School Admission Council.
  • Seol, J., & Rubright, J.D. (2013). The impact of test characteristics on Kullback-Leibler divergence index to identify examinees with aberrant responses [Paper presentation]. The 2𝑛𝑑 Annual Conference on Statistical Detection of Potential Test Fraud, Madison, WI, USA.
  • Shu, Z. (2010). Detecting test cheating using a deterministic, gated item response theory model [Unpublished doctoral dissertation]. The University of North Carolina.
  • Sideridis, G.D., & Zopluoglu, C. (2018). Validation of response similarity analysis for the detection of academic cheating: An experimental study. Journal of Applied Measurement, 19(1), 59 75.
  • Singmann, H. (2020). Complete environment for bayesian inference (LaplaceDemon) (Version 16.1.4) [Computer software manual]. https://cran.r project.org/web/packages/LaplacesDemon/LaplacesDemon.pdf
  • Sinharay, S., & Johnson, M.S. (2020). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology, 73(3), 397-419. https://doi.org/10.1111/bmsp.12187
  • Sotaridona, L.S., & Meijer, R.R. (2002). Statistical properties of the K-index for detecting answer copying in a multiplechoice test. Journal of Educational Measurement, 39(2), 115 132. https://www.jstor.org/stable/1435251
  • Sotaridona, L.S., & Meijer, R.R. (2003). Two new statistics to detect answer copying. Journal of Educational Measurement, 40(1), 53-70. https://www.jstor.org/stable/1435054
  • Sunbul O., & Yormaz, S. (2018a). Effects of test level discrimination and difficulty on answer copying indices. International Journal of Evaluation and Research in Education, 7(1), 32 38. http://doi.org/10.11591/ijere.v7i1.11488
  • Sunbul, O., & Yormaz, S. (2018b). Investigating the performance of omega index according to item parameters and ability levels. Eurasian Journal of Educational Research, 74, 207 226. https://doi.org/10.14689/ejer.2018.74.11
  • Sahin, A. (2012). Madde Tepki Kuramı'nda test uzunluğuve örneklem büyüklüğünün model veri uyumu, madde parametreleri ve standart hata değerlerine etkisinin incelenmesi [An investigation on the effects of test length and sample size in item response theory on model-data fit, item parameters and standard error values] [Unpublished doctoral dissertation]. Hacettepe University.
  • Raton, M. (2014). OptimalCutpoints: An R package for selecting optimal cutpoints in diagnostic tests. Journal of Statistical Software, 61(8), 1-36. https://doi.org/10.18637/jss.v061.i08
  • Tendeiro, J.N. (2018 Person Fit (PerFit) (Version 1.4.3) [Computer software manual]. https://cran.rproject.org/web/packages/PerFit/PerFit.pdf
  • Tendeiro, N.J., & Meijer, R.R. (2012). A CUSUM to detect person misfit: a discussion and some alternatives for existing procedures, Applied Psychological Measurement 36(5), 420 442. https://doi.org/10.1177/0146621612446305
  • Tendeiro, N.J., & Meijer, R.R. (2014). Detection of invalid test scores: the usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51(3), 239 259. https://doi.org/10.1111/jedm.12046
  • Thiessen, B. (2008). Relationship between test security policies and test score manipulations. [Unpublished doctoral dissertation]. University of Iowa.
  • van der Linden, W.J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181-204. https://doi.org/10.3102/10769986031002181
  • van der Linden, W.J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287-308. https://doi.org/10.1007/s11336-006-1478-z
  • van der Linden, W.J. (2009). A bivariate log-normal response-time model for the detection of collusion between test takers. Journal of Educational and Behavioral Statistics, 34(3), 378 394. https://doi.org/10.3102/1076998609332107
  • van der Linden, W.J., & Guo, F.M. (2008). Bayesian procedures for identifying aberrant response time patterns in adaptive testing. Psychometrika, 73(3), 365 384. https://doi.org/10.1007/s11336 007 9046-8
  • van der Linden, W.J., & van Krimpen-Stoop, E.M. (2003). Using response times to detect aberrant responses in computerized adaptive testing. Psychometrika, 68(2), 251 265. https://doi.org/10.1007/BF02294800
  • Voncken, L. (2014). Comparison of the lz* person-fit index and ω copying-index in copying detection. [First Year Paper]. Tilburg University. http://arno.uvt.nl/show.cgi?fid=135361
  • Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456 477. https://doi.org/10.1111/bmsp.12054
  • Wise, S.L., & DeMars, C.E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19 38. https://doi.org/10.1111/j.1745 3984.2006.00002.x
  • Wise, S.L., & Kong, X. (2005). Response time effort: a new measure of examinee motivation in computer based tests. Applied Measurement in Education, 18(2), 163 183. https://doi.org/10.1207/s15324818ame1802_2
  • Wise, S.L., Ma, L., Kingsbury, G.G., & Hauser, C. (2010, May 1-3). An investigation of the relationship between time of testing and test-taking effort [Paper presentation]. National Council on Measurement in Education Annual Meeting, Denver, CO ,USA.
  • Wollack, J.A. (1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21(4), 307-320. https://doi.org/10.1177/01466216970214002
  • Wollack, J.A. (2003). Comparison of answer copying indices with real data. Journal of Educational Measurement, 40(3), 189-205. https://doi.org/10.1111/j.1745-3984.2003.tb01104.x
  • Wollack, J.A. (2006). Simultaneous use of multiple answer copying indexes to improve detection rates. Applied Measurement in Education, 19(4), 265-288. https://doi.org/10.1207/s15324818ame1904_3
  • Wollack, J.A., & Cohen, A.S. (1998). Detection of answer copying with unknown item and trait parameters. Applied Psychological Measurement, 22(2), 144 152. https://doi.org/10.1177/01466216980222004
  • Wollack, J.A., & Maynes, D. (2011, April 9-11). Detection of test collusion using item response data [Paper presentation]. National Council on Measurement in Education Annual Meeting, New Orleans, LA, USA.
  • Yormaz, S., & Sunbul, O. (2017). Determination of type I error rates and power of answer copying ındices under various conditions. Educatıonal Sciences: Theory & Practıce, 17(1), 5 26. https://doi.org/10.12738/estp.2017.1.0105
  • Yormaz, S. (2019). Test güvenliği açısından bireyler arasındaki olası iş birliğinin incelenmesi [Investigation of possible collusion between examinees in terms of test security] [Unpublished doctoral dissertation]. Mersin University.
  • Zhan, P., Jiao, H., Man, K., Wang, W.C., & He, K. (2021). Variable speed across dimensions of ability in the joint model for responses and response times. Frontiers in Psychology, 12, Article 469196. https://doi.org/10.3389/fpsyg.2021.469196
  • Zopluoglu, C. (2016). Classification performance of answer-copying indices under different types of IRT models. Applied Psychological Measurement 40(8), 592 607. https://doi.org/10.1177/0146621616664724
  • Zopluoglu, C. (2017). Similarity, answer copying, and aberrance: Understanding the status quo. In G. Cizek & J. Wollack (Eds.), Handbook of Detecting Cheating on Tests (pp. 44-72). Routledge.
  • Zopluoglu, C., & Davenport, E.C., Jr. (2012). The empirical power and type I error rates of the GBT and ω indices in detecting answer copying on multiple-choice tests. Educational and Psychological Measurement, 72(6), 975-1000. https://doi.org/10.1177/0013164412442941
  • Zopluoglu, C. (2018). Computing response similarity indices for multiple-choice tests (CopyDetect) (Version 1.3) [Computer software manual]. https://cran.r project.org/web/packages/CopyDetect/index.html
There are 62 citations in total.

Details

Primary Language English
Subjects Similation Study
Journal Section Research Article
Authors

Arzu Uçar 0000-0002-0099-1348

C.deha Doğan 0000-0003-0683-1334

Submission Date September 16, 2024
Acceptance Date September 27, 2025
Publication Date January 2, 2026
Published in Issue Year 2026 Volume: 13 Issue: 1

Cite

APA Uçar, A., & Doğan, C. (2026). Detecting suspicious persons in multiple-choice tests: Comparison of performance of 𝜔 and GBT indexes. International Journal of Assessment Tools in Education, 13(1), 95-107. https://doi.org/10.21449/ijate.1550949

23823             23825             23824