Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması

Ceylan Gündeğer; Nuri Doğan

doi:10.21031/epod.401077

Research Article

Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması

Year 2018, , 161 - 177, 30.06.2018

Ceylan Gündeğer , Nuri Doğan

https://doi.org/10.21031/epod.401077

Cited By: 3

Abstract

Bu çalışmada Bireyselleştirilmiş Bilgisayarlı
Sınıflama Testleri’nin (BBST) etkililiğinin sınıflama kriterlerine, madde seçme
ve yetenek kestirim yöntemlerine göre nasıl değiştiğinin belirlenmesi
amaçlanmıştır. Bu amaçla 3 Parametreli Lojistik Model temel alınmış; belirlenen
kesme noktası ve etrafında yüksek bilgi verecek şekilde 500 maddelik bir havuz
oluşturulmuş; birey yetenekleri (N(0,1)) 3000 kişi üzerinden türetilmiş ve
bireylerin madde cevap örüntüleri R yazılımda rasgele türetilmiştir. Sınıflama
kriterlerinden Ardışık Olasılık Oran Testi (AOOT), Genelleştirilmiş Olabilirlik
Oranı (GOO) ve Güven Aralığı (GA) yöntemleri; yetenek kestirim yöntemlerinden
Beklenen Sonsal Dağılım (BSD) ve Ağırlıklandırılmış Olabilirlik Kestirimi (AOK)
yöntemleri; madde seçme yöntemlerinden ise kesme noktasında (KN) ve kestirilen
yetenek (KY) temelinde Maksimum Fisher Bilgisi (MFB) ve Kullback-Leibler
Bilgisi (KLB) yöntemleri çaprazlanarak 48 koşul oluşturulmuştur. R yazılımında
yürütülen BBST simülasyonu sonunda, ortalama test uzunluğu (OTU), ortalama
sınıflama doğruluğu (OSD), bireylerin gerçek yetenek düzeyleri ile kestirilen
yetenek düzeyleri arasındaki korelasyon (r), yanlılık, RMSE ve ortalama mutlak
hata (OMH) değerlerinin 25 tekrara ait ortalamaları hesaplanmıştır. Araştırma
sonuçlarına göre test etkililiği bakımından GOO ve GA yöntemlerinin; ölçme
kesinliği bakımından ise AOOT’nin daha iyi performans gösterdiği; sınıflama
kriterlerinin farksızlık bölgesi genişledikçe veya hata düzeyi değeri
küçüldükçe test etkililiğinin arttığı; sınıflama kriterlerinin tümünün her
koşulda oldukça yüksek düzeyde sınıflama doğruluğuna sahip olduğu
belirlenmiştir. Bireylerin gerçek yetenek düzeyleri ile kestirilen yetenek
düzeyleri arasındaki korelasyon bakımından BSD ve AOK yetenek kestirim
yöntemlerinin her ikisinin de başarılı kestirimlerde bulundukları ancak ölçme
kesinliği bakımından BSD’nin daha iyi performans sergilediği; madde seçme
yöntemlerinin ise tümünün birbirine benzer çalıştığı ancak MFB-KY’nin tüm
bağımlı değişkenler açısından tüm koşullarda daha iyi performans gösterdiği
görülmüştür.

Keywords

bireyselleştirilmiş bilgisayarlı sınıflama testi, sınıflama kriteri, yetenek kestirimi, madde seçme yöntemi, ölçme kesinliği

References

Boyd, A. M. (2003). Strategies for controlling testlet exposure rates in computerized adaptive testing systems. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3110732)
Cheng, P. E. & Liou, M. (2000). Estimation of trait level in computerized adaptive testing. Applied Psychological Measurement, 24(3), 257–265
Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.). Companion to organizations. London: Blackwell.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261
Eggen, T. J. H. M. & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734
Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologist. London: Lawrence Erlbaum Associates Publishers
Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: principles and applications. Boston: Kluwer Nijhoff Publishing
Lau, C. A. & Wang, T. (1998, April). Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Lau, C. A. & Wang, T. (1999, April). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
Lin, C. J. & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT Research Report Series 2000-8. [Online: https://eric.ed.gov/?id=ED445066, Accessed date: 26.2.2014.]
McBride, J. R. (1985). Computerized adaptive testing. Educational Leadership, 43(2), 25 -28
Miller, I. & Miller, M. (2004). John E. Freund’s mathematical statistics with applications. New Jersey: Prentice Hall
Nydick, S. W., Nozawa, Y. & Zhu, R. (2012, April). Accuracy and efficiency in classifying examinees using computerized adaptive tests: an application to a large scale test. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
Nydick, S. W. (2013). Multidimensional mastery testing with CAT. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3607925)
Nydick, S. W. (2014). catirt: An R Package for Simulating IRT-Based Computerized Adaptive Tests. [Online: https://cran.r-project.org/web/packages/catIrt/catIrt.pdf, Accessed date: 20.5.2015.]
R Core Team. (2013). R: A language and environment for statistical computing, (Version 3.0.1), Vienna, Austria: R Foundation for Statistical Computing. Online: http://www.R-project.org/
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.). New horizons in testing: latent trait theory and computerized adaptive testing. New York: Academic Press.
Spray, J. A. & Reckase, M. D. (1994, April). The selection of test items for decision making with a computer adaptive test. Paper presented at the annual meeting of the National Council on Measurement in Education, NewOrleans, LA.
Şencan, H. (2005). Sosyal ve davranışsal ölçümlerde güvenirlilik ve geçerlilik. Ankara: Seçkin Yayıncılık.
Thompson, N. A. & Ro, S. (2007). Computerized classification testing with composite hypotheses. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved [22.3.2014] from www.psych.umn.edu/psylabs/CATCentral/
Thompson, N. A. (2007b). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), 1-13
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical assessment, Research & Evaluation, 16(4), 1-7
van der Linden, W. J. (1990). Applications of decision theory to test-based decision making. In R. K. Hambleton & J. N. Zaal (Eds.). Advances in educational and psychological measurement. Massachusetts: Kluwer-Nijhof.
Wainer, H. (2000). Computerized adaptive testing: a primer. New Jersey: Lawrence Erlbaum Associates
Wald, A. (1947). Sequential analysis. New York: John Wiley
Wang, T., Hanson, B. A. & Lau, C. A. (1999). Reducing bias in CAT trait estimation: a comparison of approaches. Applied Psychological Measurement, 23(3), 263-278
Wang, S. & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317–331
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450
Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6(4), 473-492
Weiss, D. J. & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375
Yang, X, Poggio, J. C. & Glasnapp, D. R. (2006). Effects of estimation bias on multiple category classification with an IRT-based adaptive classification procedure. Educational and Psychological Measurement, 66(4), 545-564
Yi, Q., Wang, T. & Ban, J. (2000). Effects of scale transformation and test termination rule on the precision of ability estimates in CAT. ACT Research Report Series, 2000-2. [Online: http://onlinelibrary.wiley.com/doi/10.1111/j.1745-3984.2001.tb01127.x/full, Accessed date: 7.12.2015.]

A Comparison of Computerized Adaptive Classification Test Criteria in Terms of Test Efficiency and Measurement Precision

Year 2018, , 161 - 177, 30.06.2018

Ceylan Gündeğer , Nuri Doğan

https://doi.org/10.21031/epod.401077

Cited By: 3

Abstract

In this study, it was aimed to determine how the efficiency of the Computerized Adaptive Classification Testing (CACT) changes according to classification criteria, item selection and ability estimation methods. For this purpose, a pool of 500 items, which is based on 3 PLM and informs at the arbitrary cut-point and around, has been generated; individual abilities have been generated using normal distribution (N(0,1)) for 3000 individuals and the item response patterns have been generated randomly in R software with the Monte Carlo simulation. As classification criteria, Sequential Probability Ratio Test (SPRT), Generalized Likelihood Ratio (GLR) and Confidence Interval (CI) methods; as ability estimation methods, Expected a Posteriori (EAP) and Weighted Likelihood Estimation (WLE) methods; and as item selection methods, Maximum Fisher Information (MFI) and Kullback-Leibler Information (KLI) methods on the basis of cut-point (CP) and estimated ability (EA) have been crossed and 48 conditions have been investigated. At the end of the CACT simulations in R, the mean values of Average Test Length (ATL), Average Classification Accuracy (ACA), correlation between the true thetas and estimated thetas (r), bias, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) for 25 replications have been calculated. According to the results of the study, it has been observed that the GLR and the CI classification criteria perform better in terms of test efficiency, however the SPRT works better in terms of the measurement precision; test efficiency increases as the indifference region of classification criteria expands or the error value decreases; all classification criteria have considerably high level of the classification accuracy in all conditions. It has been concluded that both ability estimation methods have successful estimation results in terms of the correlation between true and estimated thetas (r); whereas the EAP relatively performs better in terms of the measurement precision; and all of the item selection methods work similarly to each other however the MFI-EA performs better for all conditions in terms of all dependent variables.

Keywords

computerized adaptive classification testing, classification criteria, ability estimation, item selection method, measurement precision

References

Boyd, A. M. (2003). Strategies for controlling testlet exposure rates in computerized adaptive testing systems. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3110732)
Cheng, P. E. & Liou, M. (2000). Estimation of trait level in computerized adaptive testing. Applied Psychological Measurement, 24(3), 257–265
Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.). Companion to organizations. London: Blackwell.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261
Eggen, T. J. H. M. & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734
Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologist. London: Lawrence Erlbaum Associates Publishers
Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: principles and applications. Boston: Kluwer Nijhoff Publishing
Lau, C. A. & Wang, T. (1998, April). Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Lau, C. A. & Wang, T. (1999, April). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
Lin, C. J. & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT Research Report Series 2000-8. [Online: https://eric.ed.gov/?id=ED445066, Accessed date: 26.2.2014.]
McBride, J. R. (1985). Computerized adaptive testing. Educational Leadership, 43(2), 25 -28
Miller, I. & Miller, M. (2004). John E. Freund’s mathematical statistics with applications. New Jersey: Prentice Hall
Nydick, S. W., Nozawa, Y. & Zhu, R. (2012, April). Accuracy and efficiency in classifying examinees using computerized adaptive tests: an application to a large scale test. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
Nydick, S. W. (2013). Multidimensional mastery testing with CAT. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3607925)
Nydick, S. W. (2014). catirt: An R Package for Simulating IRT-Based Computerized Adaptive Tests. [Online: https://cran.r-project.org/web/packages/catIrt/catIrt.pdf, Accessed date: 20.5.2015.]
R Core Team. (2013). R: A language and environment for statistical computing, (Version 3.0.1), Vienna, Austria: R Foundation for Statistical Computing. Online: http://www.R-project.org/
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.). New horizons in testing: latent trait theory and computerized adaptive testing. New York: Academic Press.
Spray, J. A. & Reckase, M. D. (1994, April). The selection of test items for decision making with a computer adaptive test. Paper presented at the annual meeting of the National Council on Measurement in Education, NewOrleans, LA.
Şencan, H. (2005). Sosyal ve davranışsal ölçümlerde güvenirlilik ve geçerlilik. Ankara: Seçkin Yayıncılık.
Thompson, N. A. & Ro, S. (2007). Computerized classification testing with composite hypotheses. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved [22.3.2014] from www.psych.umn.edu/psylabs/CATCentral/
Thompson, N. A. (2007b). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), 1-13
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical assessment, Research & Evaluation, 16(4), 1-7
van der Linden, W. J. (1990). Applications of decision theory to test-based decision making. In R. K. Hambleton & J. N. Zaal (Eds.). Advances in educational and psychological measurement. Massachusetts: Kluwer-Nijhof.
Wainer, H. (2000). Computerized adaptive testing: a primer. New Jersey: Lawrence Erlbaum Associates
Wald, A. (1947). Sequential analysis. New York: John Wiley
Wang, T., Hanson, B. A. & Lau, C. A. (1999). Reducing bias in CAT trait estimation: a comparison of approaches. Applied Psychological Measurement, 23(3), 263-278
Wang, S. & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317–331
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450
Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6(4), 473-492
Weiss, D. J. & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375
Yang, X, Poggio, J. C. & Glasnapp, D. R. (2006). Effects of estimation bias on multiple category classification with an IRT-based adaptive classification procedure. Educational and Psychological Measurement, 66(4), 545-564
Yi, Q., Wang, T. & Ban, J. (2000). Effects of scale transformation and test termination rule on the precision of ability estimates in CAT. ACT Research Report Series, 2000-2. [Online: http://onlinelibrary.wiley.com/doi/10.1111/j.1745-3984.2001.tb01127.x/full, Accessed date: 7.12.2015.]

There are 33 citations in total.

Details

Primary Language	Turkish
Journal Section	Articles
Authors	Ceylan Gündeğer 0000-0003-3572-1708 Nuri Doğan 0000-0001-6274-2016
Publication Date	June 30, 2018
Acceptance Date	May 22, 2018
Published in Issue	Year 2018

Cite

APA	Gündeğer, C., & Doğan, N. (2018). Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 9(2), 161-177. https://doi.org/10.21031/epod.401077

Cited By

Bilgisayarda Bireyselleştirilmiş Sınıflama Testinde Çok Kategorili Sınıflama İçin Sınıflama Koşullarının İncelenmesi

Uludağ Üniversitesi Eğitim Fakültesi Dergisi

https://doi.org/10.19171/uefad.1357800

Comparison of Different Computerized Adaptive Testing Approaches with Shadow Test Under Different Test Length and Ability Estimation Method Conditions

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.1202599

Investigation of Classification Accuracy, Test Length and Measurement Precision at Computerized Adaptive Classification Tests

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.787865

Article Files

Full Text