" />
Research Article
BibTex RIS Cite

Investigating The Performance of Item Selection Algorithms in Cognitive Diagnosis Computerized Adaptive Testing

Year 2024, Volume: 15 Issue: 2, 148 - 165, 30.06.2024
https://doi.org/10.21031/epod.1456094

Abstract

This study aimed to examine the performances of item selection algorithms in terms of measurement accuracy and computational time, using factors such as test length, number of attributes, and item quality in fixed-length CD-CAT and average test lengths and computational time, using factors such as number of attributes and item quality in variable-length CD-CAT. In the research, two different simulation studies were conducted for the fixed and variable-length tests. Item responses were generated according to the DINA model. Two item banks, which consisted of 480 items for 5 and 6 attributes, were generated, and the item banks were used for both the fixed and variable-length tests. Q-matrix was generated item by item and attribute by attribute. In the study, 3000 examinees were generated in such a way that each examinee had a 50% chance of achieving each attribute. The cognitive patterns of the examinees were estimated by using MAP. In the variable-length CD-CAT, the first-highest posterior probability threshold is 0.80, and the second-highest posterior probability threshold is 0.10. The CD-CAT administration and other analyses were conducted using R 3.6.1.At the end of the study in which the fixed-length CD-CAT was used, it was concluded that an increase in the number of attributes resulted in a decrease in the pattern recovery rates of item selection algorithms. Conversely, these rates improved with higher item quality and longer test lengths. The highest values in terms of pattern recovery rate were obtained from JSD and MPWKL algorithms. In the variable-length CD-CAT, it was concluded that the average test length increased with the number of attributes and decreased with higher item quality. Across all conditions, the JSD algorithm yielded the shortest average test length. Additionally, It has been determined that GDI algorithm had the shortest computation time in all scenarios, whereas the MPWKL algorithm exhibited the longest computation time.

Ethical Statement

Bu çalışma birinci yazarın doktora tezinden üretilmiştir.

References

  • Bennett, R. E. (2011). Formative assessment: a critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/0969594X.2010.513678
  • Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807
  • Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. https://dx.doi.org/10.1007/S11336-009-9123-2
  • de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
  • de la Torre, J. (2009). A cognitive diagnosis model for cognitively-based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183. https://doi.org/10.1177/0146621608320523
  • de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199. https://doi.org/10.1007/S11336-011-9207-7
  • DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8–26. https://doi.org/10.1177/0146621610377081
  • DiBello, L., Roussos, L. A., & Stout, W. F. (2007). Handbook of Statistics. C. R. Rao ve S. Sinharay (Ed). Review of Cognitively Diagnostic Assessment and a Summary of Psychometric Models. 26, 979-1030. https://doi.org/10.1016/S0169-7161(06)26031-0
  • Heritage, M. (2010). Formative assessment: Making it happen in the classroom. Corwin Press, https://doi.org/10.4135/9781452219493
  • Hsu, C. L., Wang, W. H., & Chen, S. Y. (2013). Variable-length computerized adaptive testing based on cognitive diagnosis models. Applied Psychological Measurement, 37(7), 563-582. https://doi.org/10.1177/0146621613488642
  • Huang, H. (2018). Effects of Item Calibration Errors on Computerized Adaptive Testing under Cognitive Diagnosis Models. Journal of Classification, 35:437-465. https://doi.org/10.1007/s00357-018-9265-y
  • Izrailev, S. (2020). tictoc: Functions for Timing R Scripts, as well as Implementations of "Stack" and "StackList" Structures_. R package version 1.2.1, <https://CRAN.R-project.org/package=tictoc>
  • Kaplan, M. (2016). Nitelik Sayısının Madde Seçme Algoritmalarının Performansı Üzerindeki Etkisi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 7(2), 285-295. https://doi.org/10.21031/epod.268486
  • Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. https://doi.org/10.1177/0146621614554650
  • Lin, C.-J., & Chang, H.-H. (2019). Item Selection Criteria with Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing. Educational and Psychological Measurement, 79(2), 335–357. https://doi.org/10.1177/0013164418790634.
  • Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172. https://doi.org/10.1007/s00357-013-9128-5
  • Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Cham, Switzerland: Springer International Publishing.
  • McGlohen, M.K., & Chang, H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavioral Research Methods, 40, 808–821. https://doi.org/10.3758/BRM.40.3.808
  • Minchen, N. D., & de la Torre, J. (2016, July). The continuous G-DINA model and the Jensen-Shannon divergence. Paper presented at the International Meeting of the Psychometric Society, Asheville, NC.
  • Pellegrino, J. W., Baxter, G. P., & Glaser, R. (1999). Addressing the “Two Disciplines” Problem: Linking theories of cognition and learning with assessment and instructional practice. Review of Research in Education, 24(1), 307–353. https://doi.org/10.3102/0091732X024001307
  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rupp, A. A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research and Perspectives, 6(4), 219–262. https://doi.org/10.1080/15366360802490866
  • Stiggins, R. J. (2002). Assessment Crisis: The Absence of Assessment for Learning. Phi Delta Kappan, 83(10), 758–765. https://doi.org/10.1177/003172170208301010
  • Stocking, M.L. (1994). Three practical issues for modern adaptive testing item pools. Ets research report series, 34. https://doi.org/10.1002/j.2333-8504.1994.tb01578.x
  • Tatsuoka, C., & Ferguson, T. (2003). Sequential classification on partially ordered sets. Journal of Royal Statistics, 65, 143–157.
  • Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society, 51, 337–350.
  • Thissen, D., & Mislevy, R. J. (2000). Computerized Adaptive Testing: A primer. H. Wainer, (Ed). Testing algorithms, Mahwah, NH: Lawrence Erlbaum Associates, Inc, p. 101-133.
  • Xu, X., Chang, H.-H., & Douglas, J. (2003, April). Computerized adaptive testing strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Montreal, Quebec, Canada.
  • van der Linden, W.J., & Glas, G.A.W. (2002). Computerized Adaptive Testing: Theory and Practice. Kluwer Academic Publishers.
  • von Davier, M. (2005). A general diagnostic model applied to language testing data (Research Report 05-16). Princeton, NJ: Educational Testing Service.
  • von Davier, M., & Cheng, Y. (2014). Multistage testing using diagnostic models. İn D. L. Yan, A. A. von Davier & C. Lewis (eds.), Computerized multistage testing: Theory and applications (p. 219-227). New York, NY: CRC Press.
  • Wang, C. (2013). Mutual Information Item Selection Method in Cognitive Diagnostic Computerized Adaptive Testing with Short Test Length. Educational and Psychological Measurement, 73(6), 1017–1035.
  • Wiliam, D. (2011). What Is Assessment for Learning? Studies in Educational Evaluation, 37, 3-14. https://doi.org/10.1016/j.stueduc.2011.03.001
  • Wickham, H., (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York,.
  • Yigit, H. D., Sorrel, M. A., & de la Torre, J. (2019). Computerized Adaptive Testing for Cognitively Based Multiple-Choice Data. Applied Psychological Measurement, 43(5), 388–401.
  • Zheng, C. (2015). Some practical item selection algorithms in cognitive diagnostic computerized adaptive testing—Smart diagnosis for smart learning. Unpublished Doctoral Dissertation. University of Illinois at Urbana–Champaign.
  • Zheng, C., & Chang, H.-H. (2016). High-efficiency response distribution–based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40, 608-6
Year 2024, Volume: 15 Issue: 2, 148 - 165, 30.06.2024
https://doi.org/10.21031/epod.1456094

Abstract

References

  • Bennett, R. E. (2011). Formative assessment: a critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/0969594X.2010.513678
  • Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807
  • Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. https://dx.doi.org/10.1007/S11336-009-9123-2
  • de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
  • de la Torre, J. (2009). A cognitive diagnosis model for cognitively-based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183. https://doi.org/10.1177/0146621608320523
  • de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199. https://doi.org/10.1007/S11336-011-9207-7
  • DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8–26. https://doi.org/10.1177/0146621610377081
  • DiBello, L., Roussos, L. A., & Stout, W. F. (2007). Handbook of Statistics. C. R. Rao ve S. Sinharay (Ed). Review of Cognitively Diagnostic Assessment and a Summary of Psychometric Models. 26, 979-1030. https://doi.org/10.1016/S0169-7161(06)26031-0
  • Heritage, M. (2010). Formative assessment: Making it happen in the classroom. Corwin Press, https://doi.org/10.4135/9781452219493
  • Hsu, C. L., Wang, W. H., & Chen, S. Y. (2013). Variable-length computerized adaptive testing based on cognitive diagnosis models. Applied Psychological Measurement, 37(7), 563-582. https://doi.org/10.1177/0146621613488642
  • Huang, H. (2018). Effects of Item Calibration Errors on Computerized Adaptive Testing under Cognitive Diagnosis Models. Journal of Classification, 35:437-465. https://doi.org/10.1007/s00357-018-9265-y
  • Izrailev, S. (2020). tictoc: Functions for Timing R Scripts, as well as Implementations of "Stack" and "StackList" Structures_. R package version 1.2.1, <https://CRAN.R-project.org/package=tictoc>
  • Kaplan, M. (2016). Nitelik Sayısının Madde Seçme Algoritmalarının Performansı Üzerindeki Etkisi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 7(2), 285-295. https://doi.org/10.21031/epod.268486
  • Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. https://doi.org/10.1177/0146621614554650
  • Lin, C.-J., & Chang, H.-H. (2019). Item Selection Criteria with Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing. Educational and Psychological Measurement, 79(2), 335–357. https://doi.org/10.1177/0013164418790634.
  • Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172. https://doi.org/10.1007/s00357-013-9128-5
  • Magis, D., Yan, D., & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Cham, Switzerland: Springer International Publishing.
  • McGlohen, M.K., & Chang, H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavioral Research Methods, 40, 808–821. https://doi.org/10.3758/BRM.40.3.808
  • Minchen, N. D., & de la Torre, J. (2016, July). The continuous G-DINA model and the Jensen-Shannon divergence. Paper presented at the International Meeting of the Psychometric Society, Asheville, NC.
  • Pellegrino, J. W., Baxter, G. P., & Glaser, R. (1999). Addressing the “Two Disciplines” Problem: Linking theories of cognition and learning with assessment and instructional practice. Review of Research in Education, 24(1), 307–353. https://doi.org/10.3102/0091732X024001307
  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rupp, A. A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research and Perspectives, 6(4), 219–262. https://doi.org/10.1080/15366360802490866
  • Stiggins, R. J. (2002). Assessment Crisis: The Absence of Assessment for Learning. Phi Delta Kappan, 83(10), 758–765. https://doi.org/10.1177/003172170208301010
  • Stocking, M.L. (1994). Three practical issues for modern adaptive testing item pools. Ets research report series, 34. https://doi.org/10.1002/j.2333-8504.1994.tb01578.x
  • Tatsuoka, C., & Ferguson, T. (2003). Sequential classification on partially ordered sets. Journal of Royal Statistics, 65, 143–157.
  • Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society, 51, 337–350.
  • Thissen, D., & Mislevy, R. J. (2000). Computerized Adaptive Testing: A primer. H. Wainer, (Ed). Testing algorithms, Mahwah, NH: Lawrence Erlbaum Associates, Inc, p. 101-133.
  • Xu, X., Chang, H.-H., & Douglas, J. (2003, April). Computerized adaptive testing strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Montreal, Quebec, Canada.
  • van der Linden, W.J., & Glas, G.A.W. (2002). Computerized Adaptive Testing: Theory and Practice. Kluwer Academic Publishers.
  • von Davier, M. (2005). A general diagnostic model applied to language testing data (Research Report 05-16). Princeton, NJ: Educational Testing Service.
  • von Davier, M., & Cheng, Y. (2014). Multistage testing using diagnostic models. İn D. L. Yan, A. A. von Davier & C. Lewis (eds.), Computerized multistage testing: Theory and applications (p. 219-227). New York, NY: CRC Press.
  • Wang, C. (2013). Mutual Information Item Selection Method in Cognitive Diagnostic Computerized Adaptive Testing with Short Test Length. Educational and Psychological Measurement, 73(6), 1017–1035.
  • Wiliam, D. (2011). What Is Assessment for Learning? Studies in Educational Evaluation, 37, 3-14. https://doi.org/10.1016/j.stueduc.2011.03.001
  • Wickham, H., (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York,.
  • Yigit, H. D., Sorrel, M. A., & de la Torre, J. (2019). Computerized Adaptive Testing for Cognitively Based Multiple-Choice Data. Applied Psychological Measurement, 43(5), 388–401.
  • Zheng, C. (2015). Some practical item selection algorithms in cognitive diagnostic computerized adaptive testing—Smart diagnosis for smart learning. Unpublished Doctoral Dissertation. University of Illinois at Urbana–Champaign.
  • Zheng, C., & Chang, H.-H. (2016). High-efficiency response distribution–based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40, 608-6
There are 37 citations in total.

Details

Primary Language English
Subjects Testing, Assessment and Psychometrics (Other)
Journal Section Articles
Authors

Semih Aşiret 0000-0002-0577-2603

Seçil Ömür Sünbül 0000-0001-9442-1516

Publication Date June 30, 2024
Submission Date March 25, 2024
Acceptance Date June 21, 2024
Published in Issue Year 2024 Volume: 15 Issue: 2

Cite

APA Aşiret, S., & Ömür Sünbül, S. (2024). Investigating The Performance of Item Selection Algorithms in Cognitive Diagnosis Computerized Adaptive Testing. Journal of Measurement and Evaluation in Education and Psychology, 15(2), 148-165. https://doi.org/10.21031/epod.1456094