Comparison of CAT Procedures at Low Ability Levels: A Simulation Study and Analysis in the Context of Students with Disabilities

Selma Şenel

doi:10.14686/buefad.1199798

Research Article

Comparison of CAT Procedures at Low Ability Levels: A Simulation Study and Analysis in the Context of Students with Disabilities

Year 2024, Volume: 13 Issue: 3, 547 - 559, 31.07.2024

Selma Şenel

https://doi.org/10.14686/buefad.1199798

Abstract

The estimation of extreme abilities in computerized adaptive testing (CAT) is more biased and less accurate than that of intermediate abilities. This situation contradicts the structure of CAT, which targets all ability levels. This research aims to determine the procedures that perform better at lower skill levels, in accordance with other ability levels, by comparing the performances of various CAT procedures. In addition, a large-scale test examined whether the determined procedures would show similar performance in the ability levels of students with disabilities, as a group unfortunately more often of extreme abilities and that CAT will offer advantages in many respects. A pool of 1000 items and 1000 examinees with standard normal ability distribution were simulated with Monte Carlo. The CAT performances of 36 conditions consisting of different item selection methods, ability estimation methods and termination rules were compared. As a result of the research, the precision criterion termination rule used together with the maximum likelihood ability estimation method, Kullbak-Leibler information item selection rule, and precision criterion termination rule with test length limit (20 items) performed better and more consistently in terms of CAT performance across the ability levels. These procedures show high performance in the ability levels of students with disabilities, also in real data.

Keywords

Computerized adaptive testing , CAT procedures , extreme ability levels , students with disabilities , Monte Carlo simulation , item selection method , students with low ability

References

Babcock, B., & Weiss, D. J. (2009). Termination criteria in computerized adaptive tests: Variable-length CATs are not biased. Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2009). Item selection rules in computerized adaptive testing: Accuracy and security. Methodology, 5(1), 7–17. https://doi.org/10.1027/1614-2241.5.1.7
Belov, D. I., & Armstrong, R. D. (2009). Direct and inverse problems of item pool design for computerized adaptive testing. Educational and Psychological Measurement, 69(4), 533–547. https://doi.org/10.1177/0013164409332224
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
Chen, S. K., Hou, L., Fitzpatrick, S. J., & Dodd, B. G. (1997). The effect of population distribution and method of theta estimation on computerized adaptive testing (cat) using the rating scale model. Educational and Psychological Measurement, 57(3), 422–439. https://doi.org/10.1177/0013164497057003004
Choi, S. W., Grady, M. W., & Dodd, B. G. (2011). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 71(1), 37–53. https://doi.org/10.1177/0013164410387338
Embretson, S., & Reise, S. P. (2000). Item Response Theory for psychologists. Lawrence Erlbaum Associates.
Gibbons, R. D., Weiss, D. J., Frank, E., & Kupfer, D. (2016). Computerized adaptive diagnosis and testing of mental health disorders. Annual Review of Clinical Psychology, 12, 83–104. https://doi.org/10.1146/annurev-clinpsy-021815-093634
Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., Frank, E., Moore, T., Kim, J. B., & Kupfer, D. J. (2014). Development of a computerized adaptive test for anxiety. American Journal of Psychiatry, 171(2), 187–194. https://doi.org/10.1176/appi.ajp.2013.13020178
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory, principles and applications. Springer Science+Business Media. https://doi.org/10.1017/CBO9781107415324.004
He, W., Diao, Q., & Hauser, C. (2013). A comparison of four item-selection methods for severely constrained CATs. NCME Annual Meeting, 1–26.
Kezer, F., & Koç, N. (2014). A comparison of computerized adaptive testing strategies. Eğitim Bilimleri Araştırmaları Dergisi, 4(1), 145–174. https://doi.org/10.12973/jesr.2014.41.8
Linacre, J. M. (2000). Computer-Adaptive Testing: A Methodology whose time has cCome. Komesa Press.
Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems. Routledge.
Magis, D., Yan, D., & von Davier, A. A. (2018). Computerized adaptive and multistage testing with R: Using packages catR and mstR. In Measurement: Interdisciplinary Research and Perspectives (Vol. 16, Issue 4). https://doi.org/10.1080/15366367.2018.1520560
Maurelli, V., & Weiss, D. J. (1981). Factors Influencing the Psychometric Characteristics of an Adaptive Testing Strategy for Test Batteries.
Ministry of National Education. (2018). Sınavla öğrenci̇ alacak ortaöğreti̇m kurumlarına i̇li̇şki̇n merkezî sınav başvuru ve uygulama klavuzu [Application guide of central examination for secondary education institutions].
Mislevy, R. J., & Bock, R. D. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42(3), 725–737. https://doi.org/10.1177/001316448204200302
Reckase, M. D. (2010). Designing item pools to optimize the functioning of a computerized adaptive test. Psychological Test and Assessment Modeling, 52(2), 127–141. https://psycnet.apa.org/record/2010-17096-001
Riley, B. B., Conrad, K. J., Bezruczko, N., & Dennis, M. L. (2007). Relative precision, efficiency and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN substance problem scale. Journal of Applied Measurement, 8(1), 48–64. /pmc/articles/PMC5933849/
Sahin, A., & Ozbasi, D. (2017). Effects of content balancing and item selection method on ability estimation in computerized adaptive tests. Eurasian Journal of Educational Research, 69, 21–36. https://doi.org/10.14689/ejer.2017.69.2
Şahin, A., & Weiss, D. J. (2015). Effects of calibration sample size and item bank size on ability estimation in computerized adaptive testing. Educational Sciences: Theory & Practice, 15(6), 1585–1595. https://doi.org/10.12738/estp.2015.6.0102
Segall, D. O. (2004). Computerized adaptive testing. Encyclopedia of Social Measurement, 429–438. https://doi.org/10.1016/B0-12-369398-5/00444-8
Şenel, S., & Kutlu, Ö. (2018a). Computerized adaptive testing design for students with visual impairment. Egitim ve Bilim, 43(194), 261–284. https://doi.org/10.15390/EB.2018.7515
Şenel, S., & Kutlu, Ö. (2018b). Comparison of two test methods for VIS: paper-pencil test and CAT. European Journal of Special Needs Education, 33(5), 631–645. https://doi.org/10.1080/08856257.2017.1391014
Şenel, S., & Şenel, H. C. (2018). Bilgisayar tabanlı testlerde evrensel tasarım: Özel gereksinimli öğrenciler için düzenlemeler [Universal design in computer-based testing: Test Accommodations for students with special needs]. In S. Dinçer (Ed.), Değişen dünyada eğitim (1st ed., pp. 113–124). Pegem Akademi. https://doi.org/10.14527/9786052412480.08
Seo, D. G., & Choi, J. (2018). Post-hoc simulation study of computerized adaptive testing for the Korean Medical Licensing Examination. Journal of Educational Evaluation for Health Professions, 15, 14. https://doi.org/10.3352/jeehp.2018.15.14
Seo, D. G., & Weiss, D. J. (2015). Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model. Educational and Psychological Measurement, 75(6), 954–978. https://doi.org/10.1177/0013164415575147
Stone, E., & Davey, T. (2011). Computer-adaptive testing for students with disabilities: A review of the literature. ETS Research Report Series, 2011(2), i–24. https://doi.org/10.1002/j.2333-8504.2011.tb02268.x
van der Linden, W. J., Ariel, A., & Veldkamp, B. P. (2006). Assembling a computerized adaptive testing item pool as a set of linear tests. Journal of Educational and Behavioral Statistics, 31(1), 81–99. https://doi.org/10.3102/10769986031001081
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. Springer.
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000a). Computerized adaptive testing: A primer. Routledge.
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000b). Computerized adaptive testing: A primer (2nd ed.). Lawrence Erlbaum Associates.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
Weiss, D. J. (1973). The stratified adaptive computerized ability test.
Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing. Journal of Methods and Measurement in the Social Sciences, 2(1), 1. https://doi.org/10.2458/jmm.v2i1.12351
Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37(1), 3–23. https://doi.org/10.1177/0146621612455687

Year 2024, Volume: 13 Issue: 3, 547 - 559, 31.07.2024

Selma Şenel

https://doi.org/10.14686/buefad.1199798

Abstract

References

Babcock, B., & Weiss, D. J. (2009). Termination criteria in computerized adaptive tests: Variable-length CATs are not biased. Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2009). Item selection rules in computerized adaptive testing: Accuracy and security. Methodology, 5(1), 7–17. https://doi.org/10.1027/1614-2241.5.1.7
Belov, D. I., & Armstrong, R. D. (2009). Direct and inverse problems of item pool design for computerized adaptive testing. Educational and Psychological Measurement, 69(4), 533–547. https://doi.org/10.1177/0013164409332224
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
Chen, S. K., Hou, L., Fitzpatrick, S. J., & Dodd, B. G. (1997). The effect of population distribution and method of theta estimation on computerized adaptive testing (cat) using the rating scale model. Educational and Psychological Measurement, 57(3), 422–439. https://doi.org/10.1177/0013164497057003004
Choi, S. W., Grady, M. W., & Dodd, B. G. (2011). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 71(1), 37–53. https://doi.org/10.1177/0013164410387338
Embretson, S., & Reise, S. P. (2000). Item Response Theory for psychologists. Lawrence Erlbaum Associates.
Gibbons, R. D., Weiss, D. J., Frank, E., & Kupfer, D. (2016). Computerized adaptive diagnosis and testing of mental health disorders. Annual Review of Clinical Psychology, 12, 83–104. https://doi.org/10.1146/annurev-clinpsy-021815-093634
Gibbons, R. D., Weiss, D. J., Pilkonis, P. A., Frank, E., Moore, T., Kim, J. B., & Kupfer, D. J. (2014). Development of a computerized adaptive test for anxiety. American Journal of Psychiatry, 171(2), 187–194. https://doi.org/10.1176/appi.ajp.2013.13020178
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory, principles and applications. Springer Science+Business Media. https://doi.org/10.1017/CBO9781107415324.004
He, W., Diao, Q., & Hauser, C. (2013). A comparison of four item-selection methods for severely constrained CATs. NCME Annual Meeting, 1–26.
Kezer, F., & Koç, N. (2014). A comparison of computerized adaptive testing strategies. Eğitim Bilimleri Araştırmaları Dergisi, 4(1), 145–174. https://doi.org/10.12973/jesr.2014.41.8
Linacre, J. M. (2000). Computer-Adaptive Testing: A Methodology whose time has cCome. Komesa Press.
Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems. Routledge.
Magis, D., Yan, D., & von Davier, A. A. (2018). Computerized adaptive and multistage testing with R: Using packages catR and mstR. In Measurement: Interdisciplinary Research and Perspectives (Vol. 16, Issue 4). https://doi.org/10.1080/15366367.2018.1520560
Maurelli, V., & Weiss, D. J. (1981). Factors Influencing the Psychometric Characteristics of an Adaptive Testing Strategy for Test Batteries.
Ministry of National Education. (2018). Sınavla öğrenci̇ alacak ortaöğreti̇m kurumlarına i̇li̇şki̇n merkezî sınav başvuru ve uygulama klavuzu [Application guide of central examination for secondary education institutions].
Mislevy, R. J., & Bock, R. D. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42(3), 725–737. https://doi.org/10.1177/001316448204200302
Reckase, M. D. (2010). Designing item pools to optimize the functioning of a computerized adaptive test. Psychological Test and Assessment Modeling, 52(2), 127–141. https://psycnet.apa.org/record/2010-17096-001
Riley, B. B., Conrad, K. J., Bezruczko, N., & Dennis, M. L. (2007). Relative precision, efficiency and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN substance problem scale. Journal of Applied Measurement, 8(1), 48–64. /pmc/articles/PMC5933849/
Sahin, A., & Ozbasi, D. (2017). Effects of content balancing and item selection method on ability estimation in computerized adaptive tests. Eurasian Journal of Educational Research, 69, 21–36. https://doi.org/10.14689/ejer.2017.69.2
Şahin, A., & Weiss, D. J. (2015). Effects of calibration sample size and item bank size on ability estimation in computerized adaptive testing. Educational Sciences: Theory & Practice, 15(6), 1585–1595. https://doi.org/10.12738/estp.2015.6.0102
Segall, D. O. (2004). Computerized adaptive testing. Encyclopedia of Social Measurement, 429–438. https://doi.org/10.1016/B0-12-369398-5/00444-8
Şenel, S., & Kutlu, Ö. (2018a). Computerized adaptive testing design for students with visual impairment. Egitim ve Bilim, 43(194), 261–284. https://doi.org/10.15390/EB.2018.7515
Şenel, S., & Kutlu, Ö. (2018b). Comparison of two test methods for VIS: paper-pencil test and CAT. European Journal of Special Needs Education, 33(5), 631–645. https://doi.org/10.1080/08856257.2017.1391014
Şenel, S., & Şenel, H. C. (2018). Bilgisayar tabanlı testlerde evrensel tasarım: Özel gereksinimli öğrenciler için düzenlemeler [Universal design in computer-based testing: Test Accommodations for students with special needs]. In S. Dinçer (Ed.), Değişen dünyada eğitim (1st ed., pp. 113–124). Pegem Akademi. https://doi.org/10.14527/9786052412480.08
Seo, D. G., & Choi, J. (2018). Post-hoc simulation study of computerized adaptive testing for the Korean Medical Licensing Examination. Journal of Educational Evaluation for Health Professions, 15, 14. https://doi.org/10.3352/jeehp.2018.15.14
Seo, D. G., & Weiss, D. J. (2015). Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model. Educational and Psychological Measurement, 75(6), 954–978. https://doi.org/10.1177/0013164415575147
Stone, E., & Davey, T. (2011). Computer-adaptive testing for students with disabilities: A review of the literature. ETS Research Report Series, 2011(2), i–24. https://doi.org/10.1002/j.2333-8504.2011.tb02268.x
van der Linden, W. J., Ariel, A., & Veldkamp, B. P. (2006). Assembling a computerized adaptive testing item pool as a set of linear tests. Journal of Educational and Behavioral Statistics, 31(1), 81–99. https://doi.org/10.3102/10769986031001081
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. Springer.
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000a). Computerized adaptive testing: A primer. Routledge.
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000b). Computerized adaptive testing: A primer (2nd ed.). Lawrence Erlbaum Associates.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
Weiss, D. J. (1973). The stratified adaptive computerized ability test.
Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing. Journal of Methods and Measurement in the Social Sciences, 2(1), 1. https://doi.org/10.2458/jmm.v2i1.12351
Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37(1), 3–23. https://doi.org/10.1177/0146621612455687

There are 37 citations in total.

Details

Primary Language	English
Subjects	Studies on Education
Journal Section	Articles
Authors	Selma Şenel 0000-0002-5803-0793
Early Pub Date	July 18, 2024
Publication Date	July 31, 2024
Published in Issue	Year 2024 Volume: 13 Issue: 3

Cite

APA	Şenel, S. (2024). Comparison of CAT Procedures at Low Ability Levels: A Simulation Study and Analysis in the Context of Students with Disabilities. Bartın University Journal of Faculty of Education, 13(3), 547-559. https://doi.org/10.14686/buefad.1199798

Download Cover Image

Article Files

Full Text

All the articles published in the journal are open access and distributed under the conditions of CommonsAttribution-NonCommercial 4.0 International License

Bartın University Journal of Faculty of Education