Investigating The Performance of Item Selection Algorithms in Cognitive Diagnosis Computerized Adaptive Testing

Semih Aşiret; Seçil Ömür Sünbül

doi:10.21031/epod.1456094

Research Article

Investigating The Performance of Item Selection Algorithms in Cognitive Diagnosis Computerized Adaptive Testing

Year 2024, Volume: 15 Issue: 2, 148 - 165, 30.06.2024

Semih Aşiret , Seçil Ömür Sünbül

https://doi.org/10.21031/epod.1456094

Cited By: 1

Abstract

This study aimed to examine the performances of item selection algorithms in terms of measurement accuracy and computational time, using factors such as test length, number of attributes, and item quality in fixed-length CD-CAT and average test lengths and computational time, using factors such as number of attributes and item quality in variable-length CD-CAT. In the research, two different simulation studies were conducted for the fixed and variable-length tests. Item responses were generated according to the DINA model. Two item banks, which consisted of 480 items for 5 and 6 attributes, were generated, and the item banks were used for both the fixed and variable-length tests. Q-matrix was generated item by item and attribute by attribute. In the study, 3000 examinees were generated in such a way that each examinee had a 50% chance of achieving each attribute. The cognitive patterns of the examinees were estimated by using MAP. In the variable-length CD-CAT, the first-highest posterior probability threshold is 0.80, and the second-highest posterior probability threshold is 0.10. The CD-CAT administration and other analyses were conducted using R 3.6.1.At the end of the study in which the fixed-length CD-CAT was used, it was concluded that an increase in the number of attributes resulted in a decrease in the pattern recovery rates of item selection algorithms. Conversely, these rates improved with higher item quality and longer test lengths. The highest values in terms of pattern recovery rate were obtained from JSD and MPWKL algorithms. In the variable-length CD-CAT, it was concluded that the average test length increased with the number of attributes and decreased with higher item quality. Across all conditions, the JSD algorithm yielded the shortest average test length. Additionally, It has been determined that GDI algorithm had the shortest computation time in all scenarios, whereas the MPWKL algorithm exhibited the longest computation time.

Keywords

computerized adaptive testing, cognitive diagnosis models, item selection algorithms

Ethical Statement

Bu çalışma birinci yazarın doktora tezinden üretilmiştir.

References

Bennett, R. E. (2011). Formative assessment: a critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/0969594X.2010.513678
Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. https://dx.doi.org/10.1007/S11336-009-9123-2
de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
de la Torre, J. (2009). A cognitive diagnosis model for cognitively-based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183. https://doi.org/10.1177/0146621608320523
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199. https://doi.org/10.1007/S11336-011-9207-7
DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8–26. https://doi.org/10.1177/0146621610377081
DiBello, L., Roussos, L. A., & Stout, W. F. (2007). Handbook of Statistics. C. R. Rao ve S. Sinharay (Ed). Review of Cognitively Diagnostic Assessment and a Summary of Psychometric Models. 26, 979-1030. https://doi.org/10.1016/S0169-7161(06)26031-0
Heritage, M. (2010). Formative assessment: Making it happen in the classroom. Corwin Press, https://doi.org/10.4135/9781452219493
Hsu, C. L., Wang, W. H., & Chen, S. Y. (2013). Variable-length computerized adaptive testing based on cognitive diagnosis models. Applied Psychological Measurement, 37(7), 563-582. https://doi.org/10.1177/0146621613488642
Huang, H. (2018). Effects of Item Calibration Errors on Computerized Adaptive Testing under Cognitive Diagnosis Models. Journal of Classification, 35:437-465. https://doi.org/10.1007/s00357-018-9265-y
Izrailev, S. (2020). tictoc: Functions for Timing R Scripts, as well as Implementations of "Stack" and "StackList" Structures_. R package version 1.2.1, <https://CRAN.R-project.org/package=tictoc>
Kaplan, M. (2016). Nitelik Sayısının Madde Seçme Algoritmalarının Performansı Üzerindeki Etkisi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 7(2), 285-295. https://doi.org/10.21031/epod.268486
Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. https://doi.org/10.1177/0146621614554650
Lin, C.-J., & Chang, H.-H. (2019). Item Selection Criteria with Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing. Educational and Psychological Measurement, 79(2), 335–357. https://doi.org/10.1177/0013164418790634.
Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172. https://doi.org/10.1007/s00357-013-9128-5
Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Cham, Switzerland: Springer International Publishing.
McGlohen, M.K., & Chang, H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavioral Research Methods, 40, 808–821. https://doi.org/10.3758/BRM.40.3.808
Minchen, N. D., & de la Torre, J. (2016, July). The continuous G-DINA model and the Jensen-Shannon divergence. Paper presented at the International Meeting of the Psychometric Society, Asheville, NC.
Pellegrino, J. W., Baxter, G. P., & Glaser, R. (1999). Addressing the “Two Disciplines” Problem: Linking theories of cognition and learning with assessment and instructional practice. Review of Research in Education, 24(1), 307–353. https://doi.org/10.3102/0091732X024001307
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Rupp, A. A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research and Perspectives, 6(4), 219–262. https://doi.org/10.1080/15366360802490866
Stiggins, R. J. (2002). Assessment Crisis: The Absence of Assessment for Learning. Phi Delta Kappan, 83(10), 758–765. https://doi.org/10.1177/003172170208301010
Stocking, M.L. (1994). Three practical issues for modern adaptive testing item pools. Ets research report series, 34. https://doi.org/10.1002/j.2333-8504.1994.tb01578.x
Tatsuoka, C., & Ferguson, T. (2003). Sequential classification on partially ordered sets. Journal of Royal Statistics, 65, 143–157.
Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society, 51, 337–350.
Thissen, D., & Mislevy, R. J. (2000). Computerized Adaptive Testing: A primer. H. Wainer, (Ed). Testing algorithms, Mahwah, NH: Lawrence Erlbaum Associates, Inc, p. 101-133.
Xu, X., Chang, H.-H., & Douglas, J. (2003, April). Computerized adaptive testing strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Montreal, Quebec, Canada.
van der Linden, W.J., & Glas, G.A.W. (2002). Computerized Adaptive Testing: Theory and Practice. Kluwer Academic Publishers.
von Davier, M. (2005). A general diagnostic model applied to language testing data (Research Report 05-16). Princeton, NJ: Educational Testing Service.
von Davier, M., & Cheng, Y. (2014). Multistage testing using diagnostic models. İn D. L. Yan, A. A. von Davier & C. Lewis (eds.), Computerized multistage testing: Theory and applications (p. 219-227). New York, NY: CRC Press.
Wang, C. (2013). Mutual Information Item Selection Method in Cognitive Diagnostic Computerized Adaptive Testing with Short Test Length. Educational and Psychological Measurement, 73(6), 1017–1035.
Wiliam, D. (2011). What Is Assessment for Learning? Studies in Educational Evaluation, 37, 3-14. https://doi.org/10.1016/j.stueduc.2011.03.001
Wickham, H., (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York,.
Yigit, H. D., Sorrel, M. A., & de la Torre, J. (2019). Computerized Adaptive Testing for Cognitively Based Multiple-Choice Data. Applied Psychological Measurement, 43(5), 388–401.
Zheng, C. (2015). Some practical item selection algorithms in cognitive diagnostic computerized adaptive testing—Smart diagnosis for smart learning. Unpublished Doctoral Dissertation. University of Illinois at Urbana–Champaign.
Zheng, C., & Chang, H.-H. (2016). High-efficiency response distribution–based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40, 608-6

Year 2024, Volume: 15 Issue: 2, 148 - 165, 30.06.2024

Semih Aşiret , Seçil Ömür Sünbül

https://doi.org/10.21031/epod.1456094

Cited By: 1

Abstract

References

Bennett, R. E. (2011). Formative assessment: a critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/0969594X.2010.513678
Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. https://dx.doi.org/10.1007/S11336-009-9123-2
de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
de la Torre, J. (2009). A cognitive diagnosis model for cognitively-based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183. https://doi.org/10.1177/0146621608320523
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199. https://doi.org/10.1007/S11336-011-9207-7
DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8–26. https://doi.org/10.1177/0146621610377081
DiBello, L., Roussos, L. A., & Stout, W. F. (2007). Handbook of Statistics. C. R. Rao ve S. Sinharay (Ed). Review of Cognitively Diagnostic Assessment and a Summary of Psychometric Models. 26, 979-1030. https://doi.org/10.1016/S0169-7161(06)26031-0
Heritage, M. (2010). Formative assessment: Making it happen in the classroom. Corwin Press, https://doi.org/10.4135/9781452219493
Hsu, C. L., Wang, W. H., & Chen, S. Y. (2013). Variable-length computerized adaptive testing based on cognitive diagnosis models. Applied Psychological Measurement, 37(7), 563-582. https://doi.org/10.1177/0146621613488642
Huang, H. (2018). Effects of Item Calibration Errors on Computerized Adaptive Testing under Cognitive Diagnosis Models. Journal of Classification, 35:437-465. https://doi.org/10.1007/s00357-018-9265-y
Izrailev, S. (2020). tictoc: Functions for Timing R Scripts, as well as Implementations of "Stack" and "StackList" Structures_. R package version 1.2.1, <https://CRAN.R-project.org/package=tictoc>
Kaplan, M. (2016). Nitelik Sayısının Madde Seçme Algoritmalarının Performansı Üzerindeki Etkisi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 7(2), 285-295. https://doi.org/10.21031/epod.268486
Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. https://doi.org/10.1177/0146621614554650
Lin, C.-J., & Chang, H.-H. (2019). Item Selection Criteria with Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing. Educational and Psychological Measurement, 79(2), 335–357. https://doi.org/10.1177/0013164418790634.
Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172. https://doi.org/10.1007/s00357-013-9128-5
Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Cham, Switzerland: Springer International Publishing.
McGlohen, M.K., & Chang, H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavioral Research Methods, 40, 808–821. https://doi.org/10.3758/BRM.40.3.808
Minchen, N. D., & de la Torre, J. (2016, July). The continuous G-DINA model and the Jensen-Shannon divergence. Paper presented at the International Meeting of the Psychometric Society, Asheville, NC.
Pellegrino, J. W., Baxter, G. P., & Glaser, R. (1999). Addressing the “Two Disciplines” Problem: Linking theories of cognition and learning with assessment and instructional practice. Review of Research in Education, 24(1), 307–353. https://doi.org/10.3102/0091732X024001307
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Rupp, A. A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research and Perspectives, 6(4), 219–262. https://doi.org/10.1080/15366360802490866
Stiggins, R. J. (2002). Assessment Crisis: The Absence of Assessment for Learning. Phi Delta Kappan, 83(10), 758–765. https://doi.org/10.1177/003172170208301010
Stocking, M.L. (1994). Three practical issues for modern adaptive testing item pools. Ets research report series, 34. https://doi.org/10.1002/j.2333-8504.1994.tb01578.x
Tatsuoka, C., & Ferguson, T. (2003). Sequential classification on partially ordered sets. Journal of Royal Statistics, 65, 143–157.
Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society, 51, 337–350.
Thissen, D., & Mislevy, R. J. (2000). Computerized Adaptive Testing: A primer. H. Wainer, (Ed). Testing algorithms, Mahwah, NH: Lawrence Erlbaum Associates, Inc, p. 101-133.
Xu, X., Chang, H.-H., & Douglas, J. (2003, April). Computerized adaptive testing strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Montreal, Quebec, Canada.
van der Linden, W.J., & Glas, G.A.W. (2002). Computerized Adaptive Testing: Theory and Practice. Kluwer Academic Publishers.
von Davier, M. (2005). A general diagnostic model applied to language testing data (Research Report 05-16). Princeton, NJ: Educational Testing Service.
von Davier, M., & Cheng, Y. (2014). Multistage testing using diagnostic models. İn D. L. Yan, A. A. von Davier & C. Lewis (eds.), Computerized multistage testing: Theory and applications (p. 219-227). New York, NY: CRC Press.
Wang, C. (2013). Mutual Information Item Selection Method in Cognitive Diagnostic Computerized Adaptive Testing with Short Test Length. Educational and Psychological Measurement, 73(6), 1017–1035.
Wiliam, D. (2011). What Is Assessment for Learning? Studies in Educational Evaluation, 37, 3-14. https://doi.org/10.1016/j.stueduc.2011.03.001
Wickham, H., (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York,.
Yigit, H. D., Sorrel, M. A., & de la Torre, J. (2019). Computerized Adaptive Testing for Cognitively Based Multiple-Choice Data. Applied Psychological Measurement, 43(5), 388–401.
Zheng, C. (2015). Some practical item selection algorithms in cognitive diagnostic computerized adaptive testing—Smart diagnosis for smart learning. Unpublished Doctoral Dissertation. University of Illinois at Urbana–Champaign.
Zheng, C., & Chang, H.-H. (2016). High-efficiency response distribution–based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40, 608-6

There are 37 citations in total.

Details

Primary Language	English
Subjects	Testing, Assessment and Psychometrics (Other)
Journal Section	Articles
Authors	Semih Aşiret 0000-0002-0577-2603 Seçil Ömür Sünbül 0000-0001-9442-1516
Publication Date	June 30, 2024
Submission Date	March 25, 2024
Acceptance Date	June 21, 2024
Published in Issue	Year 2024 Volume: 15 Issue: 2

Cite

APA	Aşiret, S., & Ömür Sünbül, S. (2024). Investigating The Performance of Item Selection Algorithms in Cognitive Diagnosis Computerized Adaptive Testing. Journal of Measurement and Evaluation in Education and Psychology, 15(2), 148-165. https://doi.org/10.21031/epod.1456094

Cited By

A Systematic Review on Computerized Adaptive Testing

Erzincan Üniversitesi Eğitim Fakültesi Dergisi

https://doi.org/10.17556/erziefd.1577880

Download Cover Image

Article Files

Full Text