The Effect of Test Design on Misrouting in Computerized Multistage Testing

Mahmut Sami Yiğiter; Nuri Doğan

doi:10.46778/goputeb.1267319

Araştırma Makalesi

Bireyselleştirilmiş Çok Aşamalı Testlerde Test Tasarımının Yanlış Yönlendirmeye Etkisi

Yıl 2023, , 549 - 587, 31.10.2023

Mahmut Sami Yiğiter , Nuri Doğan

https://doi.org/10.46778/goputeb.1267319

Öz

Bireyselleştirilmiş Çok Aşamalı Testler (BÇAT), test katılımcısının yetenek düzeyine göre önceden birleştirilmiş panel üzerinde aşama ve modülleri tamamlayarak ilerlediği bireyselleştirilmiş bir test yaklaşımıdır. Test katılımcısı, her aşamada modüle verdiği yanıtlara göre ileriki aşamada bir modüle yönlendirilir. Katılımcının ilerleyen aşamalarda yetenek düzeyine en uygun modüle yönlendirilmesi beklenir. Eğer ki katılımcı yetenek düzeyine uygun modüle yönlendirilmiyorsa yanlış yönlendirmeden bahsedilebilir. Yanlış yönlendirmenin hem ölçme kesinliğini hem de katılımcının sınav psikolojisini etkilediği düşünülmektedir. Yanlış yönlendirmeden tamamıyla kurtulmak çok güç olsa da BÇAT tasarımının temel bileşenleri ile azaltılabileceği varsayılmaktadır. Bu çalışmanın amacı, farklı BÇAT tasarımlarına göre yanlış yönlendirme düzeylerinin belirlenmesi ve test tasarımında yapılacak değişimlerin yanlış yönlendirme düzeyine etkilerinin araştırılmasıdır. Yanlış yönlendirmeyi etkileyeceği düşünülen temel bileşenler olarak BÇAT test tasarımı [1-3, 1-2-3, 1-3-3], yönlendirme modülü tasarımı [Dar, Geniş], test uzunluğu [12, 24, 36] ve modül uzunluğu [U-K, O-O, K-U] değişimlenmiştir. Mevcut durumu ortaya koymayı hedefleyen bu çalışma, betimsel araştırma türünde olup simülasyon yöntemi ile gerçekleştirilmiştir. Araştırma sonuçları, BÇAT tasarım ve bileşenlerinin yanlış yönlendirmeyi azaltmada etkili olabileceğini göstermektedir. Üç aşamalı BÇAT tasarımları iki aşamalı BÇAT tasarımına göre daha düşük yanlış yönlendirme ve daha yüksek ölçme kesinliği sunmaktadır. Ayrıca, test uzunluğunun arttırılması, yönlendirme modülünün geniş yetenek aralığında tasarlanması yanlış yönlendirme oranını düşürmektedir. Ölçme kesinliği sonuçlarına göre yanlış yönlendirilenlerin ölçme kesinliğinin düşük olmasının yanında, yanlış yönlendirmenin genel olarak BÇAT’ta önemli bir sorun oluşturmadığı ifade edilebilir. Yanlış yönlendirilenlerin yetenek düzeylerinin bitişik modüllerin modül bilgi fonksiyonlarının kesişim noktalarında ve genellikle yetenek ölçeğinin ortalarında yoğunlaştığı sonucuna ulaşılmıştır.

Anahtar Kelimeler

Bireyselleştirilmiş Çok Aşamalı Testler, Yönlendirme, Yanlış Yönlendirme, Uyarlanabilir Testler, Ölçme Kesinliği

Kaynakça

Breithaupt, K. J., Mills, C. N., & Melican, G. J. (2006). Facing the opportunities of the future. In D. Bartram & R. Hambleton (Eds.), Computer-based testing and the Internet: Issues and advances, 219-251. John Wiley & Sons Ltd.
Cai, L., Albano, A. D., & Roussos, L. A. (2021). An investigation of item calibration methods in multistage testing. Measurement: Interdisciplinary Research and Perspectives, 19(3), 163–178. https://doi.org/10.1080/15366367.2021.1878778
Demir, S. (2022). The effect of item pool and selection algorithms on computerized classification testing (CCT) performance. Journal of Educational Technology and Online Learning, 5(3), 573-584. https://doi.org/10.31681/jetol.1099580
Erdem Kara, B. (2022). Yönlendirme yöntemlerinin çok aşamalı testler üzerindeki etkisi [Effect of routing methods on the performance of multi-stage tests]. Uluslararası Türk Eğitim Bilimleri Dergisi 10 (19), 343-354. https://doi.org/10.46778/goputeb.1123902
Erdem Kara, B., & Doğan, N. (2022). The effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests. International Journal of Assessment Tools in Education, 9(3), 682–696. https://doi.org/10.21449/ijate.1105769
Erkuş, A. (2012). Psikolojide ölçme ve ölçek geliştirme [Measurement and scale development in psychology]. Pegem Akademi Yayınları.
Eroğlu, M. G., & Kelecioğlu, H. (2015). Bireyselleştirilmiş bilgisayarlı test uygulamalarında farklı sonlandırma kurallarının ölçme kesinliği ve test uzunluğu açısından karşılaştırılması [Comparison of different test termination rules in terms of measurement precision and test length in computerized adaptive testing]. Uludağ Üniversitesi Eğitim Fakültesi Dergisi, 28(1), 31-52. https://doi.org/10.19171/uuefd.87973
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics: Conducting simulation studies in psychometrics. Educational Measurement Issues and Practice, 35(2), 36–49. https://doi.org/10.1111/emip.12111
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education. In(E. Kedelapan (Ed.). McGraw-Hill Companies.
Harwell, M., Stone, C. A., Hsu, T. C. & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000201
Karamese, H. (2022). A comparison of final scoring methods under the multistage adaptive testing framework. [Unpublished Doctoral Dissertation]. The University of Iowa.
Karatoprak Ersen, R., & Lee, W.-C. (2023). Pretest item calibration in computerized multistage adaptive testing. Journal of Educational Measurement. https://doi.org/10.1111/jedm.12361
Khorramdel, L., Pokropek, A., Joo, S. H., Kirsch, I., & Halderman, L. (2020). Examining gender DIF and gender differences in the PISA 2018 reading literacy scale: A partial invariance approach. Psychological Test and Assessment Modeling, 62(2), 179-231.
Kim, S., & Moses, T. (2014). An investigation of the impact of misrouting under two-stage multistage testing: A simulation study: An investigation of the impact of misrouting. ETS Research Report Series, 2014(1), 1–13. https://doi.org/10.1002/ets2.12000
Kim, S., Moses, T., & Yoo, H. H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing: A comparison of IRT proficiency estimation methods. Journal of Educational Measurement, 52(1), 70–79. https://doi.org/10.1111/jedm.12063
Kirsch, I., & Lennon, M. L. (2017). PIAAC: a new design for a new era. Large-Scale Assessments in Education, 5(1), 1-22. https://doi.org/10.1186/s40536-017-0046-6
Ling, G., Attali, Y., Finn, B., & Stone, E. A. (2017). Is a computerized adaptive test more motivating than a fixed-item test?. Applied Psychological Measurement, 41(7), 495–511. https://doi.org/10.1177/0146621617707556
Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test: Top-down multistage. Journal of Educational Measurement, 55(2), 243–263. https://doi.org/10.1111/jedm.12174
Ma, Y. C. (2020). Investigating hybrid test designs in passage-based adaptive tests. [Doctoral dissertation, The University of Iowa]. https://doi.org/10.17077/etd.005590
Magis, D., Yan, D., & Von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
Martin, A. J., & Lazendic, G. (2018). Computer-adaptive testing: Implications for students’ achievement, motivation, engagement, and subjective test experience. Journal of Educational Psychology, 110(1), 27–45. https://doi.org/10.1037/edu0000205
Mooney, C. Z. (1997). Monte carlo simulation. Sage.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods: Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing. [Doctoral dissertation, University of Massachusetts Amherst].
Rotou, O., Patsula, L., Steffen, M., & Rizavi, S. (2007). Comparison of multistage tests with computerized adaptive and paper-and-pencil tests. ETS Research Report Series, 2007(1), 1–27. https://doi.org/10.1002/j.2333-8504.2007.tb02046.x
Sari, H. İ., & Huggins-Manley, A. C. (2017). Examining content control in adaptive tests: Computerized adaptive testing vs. Computerized adaptive multistage testing. Educational Sciences Theory & Practice, 17(5), 1759–1781. https://doi.org/10.12738/estp.2017.5.0484
Şahin, M. G. (2020). Analyzing different module characteristics in computer adaptive multistage testing. International Journal of Assessment Tools in Education, 7(2), 191–206. https://doi.org/10.21449/ijate.676947
Şenel, S. (2021). Bilgisayar ortamında bireye uyarlanmış testler [Computerized adaptive testing]. Pegem Yayınları.
Thompson, N. A., & Weiss, D. A. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16(1). https://doi.org/10.7275/wqzt-9427
Wainer, H., Mislevy, R. J., Steinberg, L., & Thissen, D. (2001). Review of computerized adaptive testing: a primer. Language Learning & Technology, 5(2).
Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing snd multistage adaptive testing. [Doctoral dissertation, Michigan State University].
Yigiter, M. S., & Dogan, N. (2023). Computerized multistage testing: Principles, designs and practices with R. Measurement: Interdisciplinary Research and Perspectives, 21(4), 254–277. https://doi.org/10.1080/15366367.2022.2158017
Zenisky, A., Hambleton, R. K., & Luecht, R. M. (2010). Multistage testing: Issues, designs, and research. J. Van der Linden (Ed.). Elements of adaptive testing, 355-372, Springer.

The Effect of Test Design on Misrouting in Computerized Multistage Testing

Yıl 2023, , 549 - 587, 31.10.2023

Mahmut Sami Yiğiter , Nuri Doğan

https://doi.org/10.46778/goputeb.1267319

Öz

Computerized Multistage Testing (MST) is an adaptive testing approach in which the test taker completes stages and modules on a pre-assembled panel according to his/her ability level. In MST, the test taker is routed to a module in the following stage based on his/her responses to the module in each stage. The test taker is expected to be routed to the module that fits his/her ability level best in the following stages. If the test taker is not routed to the module appropriate to his/her ability level, misrouting can be mentioned. Misrouting is thought to affect both measurement accuracy and the test taker's psychology. Although it is very difficult to completely eliminate misrouting, it is assumed that it can be reduced with the basic components of the MST design. The purpose of this study is to determine the level of misrouting according to different MST designs and to investigate the effects of changes in test design on the level of misrouting. The main components that are considered to affect misrouting are the MST test design [1-3, 1-2-3, 1-3-3], routing module design [Wide, Narrow], test length [12, 24, 36] and module length [L-S, M-M, S-L]. This study, which aims to reveal the current situation, is descriptive research and it was carried out by simulation method. The results of the study show that MST design and components can be effective in reducing misrouting. Three-stage MST designs offer lower misrouting and higher measurement accuracy than two-stage MST designs. Furthermore, increasing the test length and designing the routing module with a wide range of abilities reduce the misrouting rate. According to the measurement accuracy results, it can be stated that misrouting is not a significant problem in the MST in general, although the measurement accuracy of the misrouted test takers is low. It was concluded that the ability levels of the misrouted test takers were generally concentrated at the intersection points of the module information functions of the adjacent modules and generally in the middle of the ability scale.

Anahtar Kelimeler

Computerized Multistage Testing, Routing, Misrouting, Measurement Accuracy, Adaptive Testing

Kaynakça

Breithaupt, K. J., Mills, C. N., & Melican, G. J. (2006). Facing the opportunities of the future. In D. Bartram & R. Hambleton (Eds.), Computer-based testing and the Internet: Issues and advances, 219-251. John Wiley & Sons Ltd.
Cai, L., Albano, A. D., & Roussos, L. A. (2021). An investigation of item calibration methods in multistage testing. Measurement: Interdisciplinary Research and Perspectives, 19(3), 163–178. https://doi.org/10.1080/15366367.2021.1878778
Demir, S. (2022). The effect of item pool and selection algorithms on computerized classification testing (CCT) performance. Journal of Educational Technology and Online Learning, 5(3), 573-584. https://doi.org/10.31681/jetol.1099580
Erdem Kara, B. (2022). Yönlendirme yöntemlerinin çok aşamalı testler üzerindeki etkisi [Effect of routing methods on the performance of multi-stage tests]. Uluslararası Türk Eğitim Bilimleri Dergisi 10 (19), 343-354. https://doi.org/10.46778/goputeb.1123902
Erdem Kara, B., & Doğan, N. (2022). The effect of ratio of items indicating differential item functioning on computer adaptive and multi-stage tests. International Journal of Assessment Tools in Education, 9(3), 682–696. https://doi.org/10.21449/ijate.1105769
Erkuş, A. (2012). Psikolojide ölçme ve ölçek geliştirme [Measurement and scale development in psychology]. Pegem Akademi Yayınları.
Eroğlu, M. G., & Kelecioğlu, H. (2015). Bireyselleştirilmiş bilgisayarlı test uygulamalarında farklı sonlandırma kurallarının ölçme kesinliği ve test uzunluğu açısından karşılaştırılması [Comparison of different test termination rules in terms of measurement precision and test length in computerized adaptive testing]. Uludağ Üniversitesi Eğitim Fakültesi Dergisi, 28(1), 31-52. https://doi.org/10.19171/uuefd.87973
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics: Conducting simulation studies in psychometrics. Educational Measurement Issues and Practice, 35(2), 36–49. https://doi.org/10.1111/emip.12111
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education. In(E. Kedelapan (Ed.). McGraw-Hill Companies.
Harwell, M., Stone, C. A., Hsu, T. C. & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000201
Karamese, H. (2022). A comparison of final scoring methods under the multistage adaptive testing framework. [Unpublished Doctoral Dissertation]. The University of Iowa.
Karatoprak Ersen, R., & Lee, W.-C. (2023). Pretest item calibration in computerized multistage adaptive testing. Journal of Educational Measurement. https://doi.org/10.1111/jedm.12361
Khorramdel, L., Pokropek, A., Joo, S. H., Kirsch, I., & Halderman, L. (2020). Examining gender DIF and gender differences in the PISA 2018 reading literacy scale: A partial invariance approach. Psychological Test and Assessment Modeling, 62(2), 179-231.
Kim, S., & Moses, T. (2014). An investigation of the impact of misrouting under two-stage multistage testing: A simulation study: An investigation of the impact of misrouting. ETS Research Report Series, 2014(1), 1–13. https://doi.org/10.1002/ets2.12000
Kim, S., Moses, T., & Yoo, H. H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing: A comparison of IRT proficiency estimation methods. Journal of Educational Measurement, 52(1), 70–79. https://doi.org/10.1111/jedm.12063
Kirsch, I., & Lennon, M. L. (2017). PIAAC: a new design for a new era. Large-Scale Assessments in Education, 5(1), 1-22. https://doi.org/10.1186/s40536-017-0046-6
Ling, G., Attali, Y., Finn, B., & Stone, E. A. (2017). Is a computerized adaptive test more motivating than a fixed-item test?. Applied Psychological Measurement, 41(7), 495–511. https://doi.org/10.1177/0146621617707556
Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test: Top-down multistage. Journal of Educational Measurement, 55(2), 243–263. https://doi.org/10.1111/jedm.12174
Ma, Y. C. (2020). Investigating hybrid test designs in passage-based adaptive tests. [Doctoral dissertation, The University of Iowa]. https://doi.org/10.17077/etd.005590
Magis, D., Yan, D., & Von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
Martin, A. J., & Lazendic, G. (2018). Computer-adaptive testing: Implications for students’ achievement, motivation, engagement, and subjective test experience. Journal of Educational Psychology, 110(1), 27–45. https://doi.org/10.1037/edu0000205
Mooney, C. Z. (1997). Monte carlo simulation. Sage.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods: Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing. [Doctoral dissertation, University of Massachusetts Amherst].
Rotou, O., Patsula, L., Steffen, M., & Rizavi, S. (2007). Comparison of multistage tests with computerized adaptive and paper-and-pencil tests. ETS Research Report Series, 2007(1), 1–27. https://doi.org/10.1002/j.2333-8504.2007.tb02046.x
Sari, H. İ., & Huggins-Manley, A. C. (2017). Examining content control in adaptive tests: Computerized adaptive testing vs. Computerized adaptive multistage testing. Educational Sciences Theory & Practice, 17(5), 1759–1781. https://doi.org/10.12738/estp.2017.5.0484
Şahin, M. G. (2020). Analyzing different module characteristics in computer adaptive multistage testing. International Journal of Assessment Tools in Education, 7(2), 191–206. https://doi.org/10.21449/ijate.676947
Şenel, S. (2021). Bilgisayar ortamında bireye uyarlanmış testler [Computerized adaptive testing]. Pegem Yayınları.
Thompson, N. A., & Weiss, D. A. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16(1). https://doi.org/10.7275/wqzt-9427
Wainer, H., Mislevy, R. J., Steinberg, L., & Thissen, D. (2001). Review of computerized adaptive testing: a primer. Language Learning & Technology, 5(2).
Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing snd multistage adaptive testing. [Doctoral dissertation, Michigan State University].
Yigiter, M. S., & Dogan, N. (2023). Computerized multistage testing: Principles, designs and practices with R. Measurement: Interdisciplinary Research and Perspectives, 21(4), 254–277. https://doi.org/10.1080/15366367.2022.2158017
Zenisky, A., Hambleton, R. K., & Luecht, R. M. (2010). Multistage testing: Issues, designs, and research. J. Van der Linden (Ed.). Elements of adaptive testing, 355-372, Springer.

Toplam 33 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Eğitimde Ölçme ve Değerlendirme (Diğer)
Bölüm	Makaleler
Yazarlar	Mahmut Sami Yiğiter 0000-0002-2896-0201 Nuri Doğan 0000-0001-6274-2016
Yayımlanma Tarihi	31 Ekim 2023
Gönderilme Tarihi	18 Mart 2023
Kabul Tarihi	11 Nisan 2023
Yayımlandığı Sayı	Yıl 2023

Kaynak Göster

APA	Yiğiter, M. S., & Doğan, N. (2023). The Effect of Test Design on Misrouting in Computerized Multistage Testing. International Journal of Turkish Education Sciences, 2023(21), 549-587. https://doi.org/10.46778/goputeb.1267319

Makale Dosyaları

Tam Metin