Investigation of Classification Accuracy, Test Length and Measurement Precision at Computerized Adaptive Classification Tests

Seda Demir; Burcu Atar

doi:10.21031/epod.787865

Research Article

Year 2021, Volume: 12 Issue: 1, 15 - 27, 31.03.2021

Seda Demir , Burcu Atar

https://doi.org/10.21031/epod.787865

Cited By: 1

Abstract

References

Bao, Y., Shen, Y., Wang, S., & Bradshaw, L. (2021). Flexible computerized adaptive tests to detect misconceptions and estimate ability simultaneously. Applied Psychological Measurement, 45(1), 3-21. doi: 10.1177/0146621620965730
Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.), Companion to organizations (pp. 829-848). London: Blackwell.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261. doi: 10.1177/01466219922031365
Eggen, T. J. H. M., & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734. doi: 10.1177/00131640021970862
Fan, Z., Wang, C., Chang, H., & Douglas, J. (2012). Utilizing response time distributions for item selection in CAT. Journal of Educational and Behavioral Statistics, 37(5), 655-670. doi: 10.3102/1076998611422912
Finkelman, M. (2008). On using stochastic curtailment to shorten the SPRT in sequential mastery testing. Journal of Educational and Behavioral Statistics, 33(4), 442-463. doi: 10.3102/1076998607308573
Gündeğer, C., & Doğan, N. (2018a). A comparison of computerized adaptive classification test criteria in terms of test efficiency and measurement precision. Journal of Measurement and Evaluation in Education and Psychology, 9(2), 161-177. doi: 10.21031/epod.401077
Gündeğer, C., & Doğan, N. (2018b). The effects of item pool characteristics on test length and classification accuracy in computerized adaptive classification testings. Hacettepe University Journal of Education, 33(4), 888-896. doi: 10.16986/HUJE.2016024284
Huebner, A. (2012). Item overexposure in computerized classification tests using sequential item selection. Practical Assessment, Research & Evaluation, 17(12), 1-9. Retrieved from https://pareonline.net/getvn.asp?v=17&n=12
Huebner, A., & Li, Z. (2012). A stochastic method for balancing item exposure rates in computerized classification tests. Applied Psychological Measurement, 36(3), 181-188. doi: 10.1177/0146621612439932
Kingsbury, G. G., & Weiss, D. J. (1980). A Comparison of adaptive, sequential and conventional testing strategies for mastery decisions (Research Report 80-4). University of Minnesota, Minneapolis: MN. Retrieved from http://iacat.org/sites/default/files/biblio/ki80-04.pdf
Kingsbury, G. G., & Weiss, D.J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing, (pp. 237-254). New York: Academic Press.
Kingsbury, G. G., & Zara, A.R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2(4), 359-375. doi: 10.1207/s15324818ame0204_6
Lau, C. A. (1996). Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data (Unpublished doctoral dissertation). University of Iowa, Iowa City IA.
Lau, C. A., & Wang, T. (1999, April). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association (AERA), Montreal, Canada. Retrieved from http://iacat.org/sites/default/files/biblio/la99-01.pdf
Leroux, A. J., Waid-Ebbs, J. K., Wen, P-S., Helmer, D. A., Graham, D. P., O’Connor, M. K, & Ray, K. (2019). An investigation of exposure control methods with variable-length cat using the partial credit model. Applied Psychological Measurement, 43(8),624-638. doi: 10.1177/0146621618824856
Leung, C.-K., Chang, H. H., & Hau, K. T. (2002). Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson–Hetter algorithm. Applied Psychological Measurement, 26(4), 376-392. doi: 10.1177/014662102237795
Lin, C. (2011). Item selection criteria with practical constraints for computerized classification testing. Applied Psychological Measurement 71(1), 20-36. doi: 10.1177/0013164410387336
Lin, C. J., & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT (Research Report 2000-8). Iowa city, IA: ACT Research Report Series. Retrieved from https://eric.ed.gov/?id=ED445066
Miller, I., & Miller, M. (2004). John E. Freund’s mathematical statistics with applications. (7th Ed.). New Jersey: Prentice Hall.
Nydick, S. W., Nozawa, Y., & Zhu, R. (2012, April). Accuracy and efficiency in classifying examinees using computerized adaptive tests: An application to a large-scale test. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME), Vancouver, British Columbia, Canada. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.3381&rep=rep1&type=pdf
R Core Team (2013). R: A language and environment for statistical computing, (Version 3.0.1) [Computer software], Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: latent trait theory and computerized adaptive testing, (pp. 237-254). New York: Academic Press.
Sie, H., Finkelman, M. D., Riley, B., & Smits, N. (2015). Utilizing response times in computerized classification testing. Applied Psychological Measurement, 39(5), 389-405. doi: 10.1177/0146621615569504
Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414. doi: 10.3102/10769986021004405
Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 937-977). San Diego, CA: Navy Personnel Research and Development Center. Retrieved from http://www.iacat.org/content/controlling-item-exposure-rates-computerized-adaptive-testing
Thompson, N. A. (2007a). A comparison of two methods of polytomous computerized classification testing for multiple cutscores (Unpublished doctoral dissertation). University of Minnesota, Minneapolis.
Thompson, N. A. (2007b). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), 1-13. Retrieved from http://www.iacat.org/sites/default/files/biblio/th07-01.pdf
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793. doi: 10.1177/0013164408324460
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research & Evaluation, 16(4), 1-7. Retrieved from https://pareonline.net/getvn.asp?v=16&n=4
Thompson, N. A., & Ro, S. (2007). Computerized classification testing with composite hypotheses. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC conference on computerized adaptive testing. Retrieved from http://www.iacat.org/sites/default/files/biblio/cat07nthompson.pdf
Van der Linden, W. J., & Veldkamp, B. P. (2004). Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, 29(3), 273-291. doi: 10.3102/10769986029003273
Van Groen, M. M., Eggen, T. J. H. M., & Veldkamp, B. P. (2016). Multidimensional computerized adaptive testing for classifying examinees with within-dimensionality. Applied Psychological Measurement, 40(6), 387-404. doi: 10.1177/0146621616648931
Wang, S., & Wang, T. (2001). Precision of warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317–331. doi: 10.1177/01466210122032163
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450. doi: 10.1007/BF02294627
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375. Retrieved from https://doi.org/10.1111/j.1745-3984.1984.tb01040.x
Wouda, J. T., & Eggen, T. J. H. M. (2009). Computerized classification testing in more than two categories by using stochastic curtailment. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing. Retrieved from http://iacat.org/sites/default/files/biblio/cat09wouda.pdf
Yang, X., Poggio, J. C., & Glasnapp, D. R. (2006). Effects of estimation bias on multiple category classification with an IRT-based adaptive classification procedure. Educational and Psychological Measurement, 66(4), 545-564. doi: 10.1177/0013164405284031

Investigation of Classification Accuracy, Test Length and Measurement Precision at Computerized Adaptive Classification Tests

Year 2021, Volume: 12 Issue: 1, 15 - 27, 31.03.2021

Seda Demir , Burcu Atar

https://doi.org/10.21031/epod.787865

Cited By: 1

Abstract

This study aims to compare Sequential Probability Ratio Test (SPRT) and Confidence Interval (CI) classification criteria, Maximum Fisher Information method on the basis of estimated-ability (MFI-EB) and Cut-Point (MFI-CB) item selection methods while ability estimation method is Weighted Likelihood Estimation (WLE) in Computerized Adaptive Classification Testing (CACT), according to the Average Classification Accuracy (ACA), Average Test Length (ATL), and measurement precision under content balancing (Constrained Computerized Adaptive Testing: CCAT and Modified Multinomial Model: MMM) and item exposure control (Sympson-Hetter Method: SH and Item Eligibility Method: IE) when the classification is done based on two, three, or four categories for a unidimensional pool of dichotomous items. Forty-eight conditions are created in Monte Carlo (MC) simulation for the data, generated in R software, including 500 items and 5000 examinees, and the results are calculated over 30 replications. As a result of the study, it was observed that CI performs better in terms of ATL, and SPRT performs better in ACA and correlation, bias, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) values, sequentially; MFI-EB is more useful than MFI-CB. It was also seen that MMM is more successful in content balancing, whereas CCAT is better in terms of test efficiency (ATL and ACA), and IE is superior in terms of item exposure control though SH is more beneficial in test efficiency. Besides, increasing the number of classification categories increases ATL but decreases ACA, and it gives better results in terms of the correlation, bias, RMSE, and MAE values.

Keywords

Computerized adaptive classification testing, content balancing, item exposure control, classification criteria, item selection methods

References

Bao, Y., Shen, Y., Wang, S., & Bradshaw, L. (2021). Flexible computerized adaptive tests to detect misconceptions and estimate ability simultaneously. Applied Psychological Measurement, 45(1), 3-21. doi: 10.1177/0146621620965730
Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.), Companion to organizations (pp. 829-848). London: Blackwell.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261. doi: 10.1177/01466219922031365
Eggen, T. J. H. M., & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734. doi: 10.1177/00131640021970862
Fan, Z., Wang, C., Chang, H., & Douglas, J. (2012). Utilizing response time distributions for item selection in CAT. Journal of Educational and Behavioral Statistics, 37(5), 655-670. doi: 10.3102/1076998611422912
Finkelman, M. (2008). On using stochastic curtailment to shorten the SPRT in sequential mastery testing. Journal of Educational and Behavioral Statistics, 33(4), 442-463. doi: 10.3102/1076998607308573
Gündeğer, C., & Doğan, N. (2018a). A comparison of computerized adaptive classification test criteria in terms of test efficiency and measurement precision. Journal of Measurement and Evaluation in Education and Psychology, 9(2), 161-177. doi: 10.21031/epod.401077
Gündeğer, C., & Doğan, N. (2018b). The effects of item pool characteristics on test length and classification accuracy in computerized adaptive classification testings. Hacettepe University Journal of Education, 33(4), 888-896. doi: 10.16986/HUJE.2016024284
Huebner, A. (2012). Item overexposure in computerized classification tests using sequential item selection. Practical Assessment, Research & Evaluation, 17(12), 1-9. Retrieved from https://pareonline.net/getvn.asp?v=17&n=12
Huebner, A., & Li, Z. (2012). A stochastic method for balancing item exposure rates in computerized classification tests. Applied Psychological Measurement, 36(3), 181-188. doi: 10.1177/0146621612439932
Kingsbury, G. G., & Weiss, D. J. (1980). A Comparison of adaptive, sequential and conventional testing strategies for mastery decisions (Research Report 80-4). University of Minnesota, Minneapolis: MN. Retrieved from http://iacat.org/sites/default/files/biblio/ki80-04.pdf
Kingsbury, G. G., & Weiss, D.J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing, (pp. 237-254). New York: Academic Press.
Kingsbury, G. G., & Zara, A.R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2(4), 359-375. doi: 10.1207/s15324818ame0204_6
Lau, C. A. (1996). Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data (Unpublished doctoral dissertation). University of Iowa, Iowa City IA.
Lau, C. A., & Wang, T. (1999, April). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association (AERA), Montreal, Canada. Retrieved from http://iacat.org/sites/default/files/biblio/la99-01.pdf
Leroux, A. J., Waid-Ebbs, J. K., Wen, P-S., Helmer, D. A., Graham, D. P., O’Connor, M. K, & Ray, K. (2019). An investigation of exposure control methods with variable-length cat using the partial credit model. Applied Psychological Measurement, 43(8),624-638. doi: 10.1177/0146621618824856
Leung, C.-K., Chang, H. H., & Hau, K. T. (2002). Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson–Hetter algorithm. Applied Psychological Measurement, 26(4), 376-392. doi: 10.1177/014662102237795
Lin, C. (2011). Item selection criteria with practical constraints for computerized classification testing. Applied Psychological Measurement 71(1), 20-36. doi: 10.1177/0013164410387336
Lin, C. J., & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT (Research Report 2000-8). Iowa city, IA: ACT Research Report Series. Retrieved from https://eric.ed.gov/?id=ED445066
Miller, I., & Miller, M. (2004). John E. Freund’s mathematical statistics with applications. (7th Ed.). New Jersey: Prentice Hall.
Nydick, S. W., Nozawa, Y., & Zhu, R. (2012, April). Accuracy and efficiency in classifying examinees using computerized adaptive tests: An application to a large-scale test. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME), Vancouver, British Columbia, Canada. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.3381&rep=rep1&type=pdf
R Core Team (2013). R: A language and environment for statistical computing, (Version 3.0.1) [Computer software], Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: latent trait theory and computerized adaptive testing, (pp. 237-254). New York: Academic Press.
Sie, H., Finkelman, M. D., Riley, B., & Smits, N. (2015). Utilizing response times in computerized classification testing. Applied Psychological Measurement, 39(5), 389-405. doi: 10.1177/0146621615569504
Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414. doi: 10.3102/10769986021004405
Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 937-977). San Diego, CA: Navy Personnel Research and Development Center. Retrieved from http://www.iacat.org/content/controlling-item-exposure-rates-computerized-adaptive-testing
Thompson, N. A. (2007a). A comparison of two methods of polytomous computerized classification testing for multiple cutscores (Unpublished doctoral dissertation). University of Minnesota, Minneapolis.
Thompson, N. A. (2007b). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), 1-13. Retrieved from http://www.iacat.org/sites/default/files/biblio/th07-01.pdf
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793. doi: 10.1177/0013164408324460
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research & Evaluation, 16(4), 1-7. Retrieved from https://pareonline.net/getvn.asp?v=16&n=4
Thompson, N. A., & Ro, S. (2007). Computerized classification testing with composite hypotheses. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC conference on computerized adaptive testing. Retrieved from http://www.iacat.org/sites/default/files/biblio/cat07nthompson.pdf
Van der Linden, W. J., & Veldkamp, B. P. (2004). Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, 29(3), 273-291. doi: 10.3102/10769986029003273
Van Groen, M. M., Eggen, T. J. H. M., & Veldkamp, B. P. (2016). Multidimensional computerized adaptive testing for classifying examinees with within-dimensionality. Applied Psychological Measurement, 40(6), 387-404. doi: 10.1177/0146621616648931
Wang, S., & Wang, T. (2001). Precision of warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317–331. doi: 10.1177/01466210122032163
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450. doi: 10.1007/BF02294627
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375. Retrieved from https://doi.org/10.1111/j.1745-3984.1984.tb01040.x
Wouda, J. T., & Eggen, T. J. H. M. (2009). Computerized classification testing in more than two categories by using stochastic curtailment. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing. Retrieved from http://iacat.org/sites/default/files/biblio/cat09wouda.pdf
Yang, X., Poggio, J. C., & Glasnapp, D. R. (2006). Effects of estimation bias on multiple category classification with an IRT-based adaptive classification procedure. Educational and Psychological Measurement, 66(4), 545-564. doi: 10.1177/0013164405284031

There are 38 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Seda Demir 0000-0003-4230-5593 Burcu Atar 0000-0003-3527-686X
Publication Date	March 31, 2021
Acceptance Date	February 21, 2021
Published in Issue	Year 2021 Volume: 12 Issue: 1

Cite

APA	Demir, S., & Atar, B. (2021). Investigation of Classification Accuracy, Test Length and Measurement Precision at Computerized Adaptive Classification Tests. Journal of Measurement and Evaluation in Education and Psychology, 12(1), 15-27. https://doi.org/10.21031/epod.787865

Cited By

Comparison of Different Computerized Adaptive Testing Approaches with Shadow Test Under Different Test Length and Ability Estimation Method Conditions

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.1202599

Download Cover Image

Article Files

Full Text