BAYESIAN VE NONBAYESIAN KESTİRİM YÖNTEMLERİNE DAYALI OLARAK SINIFLAMA İNDEKSLERİNİN TIMSS-2015 MATEMATİK TESTİ ÜZERİNDE İNCELENMESİ

Serpil Çelikten; Mehtap Çakan

doi:10.17522/balikesirnef.566446

Araştırma Makalesi

BAYESIAN VE NONBAYESIAN KESTİRİM YÖNTEMLERİNE DAYALI OLARAK SINIFLAMA İNDEKSLERİNİN TIMSS-2015 MATEMATİK TESTİ ÜZERİNDE İNCELENMESİ

Yıl 2019, Cilt: 13 Sayı: 1, 105 - 124, 30.06.2019

Serpil Çelikten , Mehtap Çakan

https://doi.org/10.17522/balikesirnef.566446

Cited By: 2

Öz

Bu araştırmanın amacı
modern test kuramı olan MTK çerçevesinde, Nonbayesian kestirim yöntemlerinden
MLE, WLE ve Bayesian kestirim yöntemlerinden MAP ve EAP ile elde edilen yetenek
kestirimlerine göre bireylerin sınıflandırılması sonucunda elde edilen sınıflama
doğruluğu ve sınıflama tutarlılığı indekslerini farklı örneklem koşullarında
karşılaştırmaktır. Bu doğrultuda MTK çerçevesinde her bir örneklem koşulu için
MLE, WLE, MAP ve EAP kestirim yöntemlerine dayalı olarak yetenek kestirimleri
elde edilmiştir. Sonrasında her bir koşul için, MTK’ya dayalı sınıflama
yaklaşımlarından biri olan Rudner’in yaklaşımı kullanılarak sınıflama doğruluğu
ve tutarlılığı indeksleri elde edilmiştir. Çalışmanın bulgularına göre
Nonbayesian yetenek kestirimlerine dayalı olarak elde edilen sınıflama
indekslerinin, Bayesian yöntemlerinden daha doğru ve tutarlı olduğu
gözlenmiştir. Nonbayesian yöntemler arasında ise en doğru ve tutarlı sınıflama
indekslerinin MLE ile kestirilen yeteneklere dayalı olarak elde edildiği
sonucuna ulaşılmıştır. Ancak yapılan ikili karşılaştırma testleri ve pratik
anlamlılık değerlerinin incelenmesi sonucunda anlamlı çıkan tüm etkilerin
pratikteki etkisinin küçük olduğu gözlenmiştir.

Anahtar Kelimeler

Bayesian-Nonbayesian kestirim yöntemleri , sınıflama doğruluğu ve tutarlılığı , madde tepki kuramı

Kaynakça

Altun, M. (2010). Matematik Öğretimi. Bursa: Pegem Akademi.
Büyüköztürk , S. Çakan, M., Tan, S., & Atar, H. Y. (2014). TIMSS 2011 ulusal matematik ve fen raporu 8. sınıflarRetrieved fromhttp://timss.meb.gov.tr/wp-content/uploads/TIMSS-2011-8-Sinif.pdf
Barnett, D. W., & Macmann, G. M. (1992). Decision reliability and validity: contribution and limitations of alternative assessment systems. The Journal of Special Education. 25(4), 431-452.
Bourque, M. L., Goodman, D., Hambleton, R. K., & Han, N. (2004). Reliability estimates for the ABTE tests in elementary education, professional teaching knowledge, secondary mathematics and English/language arts (Final Report). Leesburg, VA: Mid-Atlantic Psychometric Services.
Cizek, G.J. ve Bunch, M. B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. London: Sage.
Cohen J (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1), 37–46.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.
Ercikan, K., & Julian, M. (2002). Classification accuracy of assigning student performance to proficiency levels: Guidelines for assessment design. Applied Measurement in Education, 15, 269-294.
Fraenkal, J. R., Wallen, N. E., & Hyun, H. H. (2008). How to design and evaluate research in education (7th ed.). New York: M Graw Hill.
Guo, F. (2006). Expected classification accuracy using the latent distribution. Practical assessment. Research & Evaluation, 11(6), 1-9.
Hambleton, R. K, & Novick, M. (1973). Toward an integration of theory and method for criterion-referenced tests. Journal of Educational Measurement, 10(3), 159-170.
Huynh, H. (1976). On the reliability of decisions in domain-referenced testing. Journal of Educational Measurement, 13, 253-264.
Kadane, J. B. (2015). Bayesian methods for prevention research. Prevention Science, 16, 1017–1025. 10.1007/s11121-014-0531-x
Lathrop, Q. N., & Cheng, Y. (2014). A nonparametric approach to estimate classification accuracy and consistency. Journal of Educational Measurement, 51, 318-334.
Lee, W., Hanson, B. A., & Brennan, R. L. (2000). Procedures for computing classification consistency and accuracy ındices with multiple categories. ACT : Inc, Research Report.
Lee, S. Y., & Song, X. Y. (2004). Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research, 39, 653–686. DOI: 10.1207/s15327906mbr3904_4
Lee, W. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47, 1-17.
Livingston SA, Lewis C (1995). “Estimating the Consistency and Accuracy of Classifications Based on Test Scores.” Journal of Educational Measurement, 32(2), 179–197.
Mislevy, R. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195 Seong, T. J., Kim, S. H. & Cohen, A. S. (1997, March). A comparison of procedures for ability estimation under the graded response model. Paper presented at the Annual Meeting of the American Educational Research Association. Chicago
Pallant, J. (2007). SPSS Survival Manual: A Step by Step Guide to Data Analysis using SPSS for Windows. New York: Open University Press.
Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment Research & Evaluation, 7(14). Available online: http://pareonline.net/getvn.asp?v=7&n=14
Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment Research & Evaluation, 10(13). Available online: http://pareonline.net/getvn.asp?v=10&n=13.
Sireci, S. G., Robin, F., & Patelis, T. (1999). Using cluster analysis to facilitate standard setting. Applied Measurement in Education, 12, 301-325.
Subkoviak, M. J. (1976). Estimating reliability from a single administration of a criterion-referenced test. Journal of Educational Measurement, 13, 265-276.
Swaminathan H, Hambleton RK, Algina J (1974). “Reliability of Criterion-Referenced Tests: A Decision-Theoretic Formulation.” Journal of Educational Measurement, 11(4), 263–267.
Yang, X., Poggio, J. C., &Glasnapp, D. R. (2006). Effects of estimation bias on multiple-category classificationwith an IRT-baesd adaptive classification procedure. Educational and Psychological Measurement,31, 275-291.
Wyse, A. E., & Hao, S. (2012). An evaluation of item response theory classification accuracy and consistency indices. Applied Psychological Measurement, 36, 602-624.
Zhang, S., Du, J., Chen, P., Xin, T., & Chen, F. (2017). Using Procedure Based on Item Response Theory to Evaluate Classification Consistency Indices in the Practice of Large-Scale Assessment. Frontiers in Psychology, 8, 1676. http://doi.org/10.3389/fpsyg.2017.01676

INVESTIGATION OF CLASSIFICATION INDICES ON TIMSS-2015 MATHEMATIC-SUBTEST THROUGH BAYESIAN AND NONBAYESIAN ESTIMATION METHODS

Yıl 2019, Cilt: 13 Sayı: 1, 105 - 124, 30.06.2019

Serpil Çelikten , Mehtap Çakan

https://doi.org/10.17522/balikesirnef.566446

Cited By: 2

Öz

Purpose of this study is to compare the classification accuracy and
consistency indices at different sample sizes in terms of Bayesian estimation
methods with MAP, EAP and Nonbayesian estimation methods with MLE, WLE in the
framework of IRT. In this direction, ability estimations based on MLE, WLE, MAP
and EAP were obtained for each sample size. Then, for each condition of sample
size, classification accuracy and consistency indices were calculated by using
the Rudner’ s appoach. According to the findings of study, it is seen that
classification indices based on Nonbayesian methods are more accurate and
consistent than the indices obtained based on Bayesian methods. Among
Nonbayesian methods, it is concluded that MLE leads the more accurate and
consistent classification indices than WLE. However, when the post hoc tests
and effect sizes are investigated, it is seen that all pairs that results in
significant difference have small effect in practice.

Anahtar Kelimeler

Bayesian-Nonbayesian estimation methods , classification accuracy and consistency indexes , item response theory

Kaynakça

Altun, M. (2010). Matematik Öğretimi. Bursa: Pegem Akademi.
Büyüköztürk , S. Çakan, M., Tan, S., & Atar, H. Y. (2014). TIMSS 2011 ulusal matematik ve fen raporu 8. sınıflarRetrieved fromhttp://timss.meb.gov.tr/wp-content/uploads/TIMSS-2011-8-Sinif.pdf
Barnett, D. W., & Macmann, G. M. (1992). Decision reliability and validity: contribution and limitations of alternative assessment systems. The Journal of Special Education. 25(4), 431-452.
Bourque, M. L., Goodman, D., Hambleton, R. K., & Han, N. (2004). Reliability estimates for the ABTE tests in elementary education, professional teaching knowledge, secondary mathematics and English/language arts (Final Report). Leesburg, VA: Mid-Atlantic Psychometric Services.
Cizek, G.J. ve Bunch, M. B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. London: Sage.
Cohen J (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1), 37–46.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.
Ercikan, K., & Julian, M. (2002). Classification accuracy of assigning student performance to proficiency levels: Guidelines for assessment design. Applied Measurement in Education, 15, 269-294.
Fraenkal, J. R., Wallen, N. E., & Hyun, H. H. (2008). How to design and evaluate research in education (7th ed.). New York: M Graw Hill.
Guo, F. (2006). Expected classification accuracy using the latent distribution. Practical assessment. Research & Evaluation, 11(6), 1-9.
Hambleton, R. K, & Novick, M. (1973). Toward an integration of theory and method for criterion-referenced tests. Journal of Educational Measurement, 10(3), 159-170.
Huynh, H. (1976). On the reliability of decisions in domain-referenced testing. Journal of Educational Measurement, 13, 253-264.
Kadane, J. B. (2015). Bayesian methods for prevention research. Prevention Science, 16, 1017–1025. 10.1007/s11121-014-0531-x
Lathrop, Q. N., & Cheng, Y. (2014). A nonparametric approach to estimate classification accuracy and consistency. Journal of Educational Measurement, 51, 318-334.
Lee, W., Hanson, B. A., & Brennan, R. L. (2000). Procedures for computing classification consistency and accuracy ındices with multiple categories. ACT : Inc, Research Report.
Lee, S. Y., & Song, X. Y. (2004). Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research, 39, 653–686. DOI: 10.1207/s15327906mbr3904_4
Lee, W. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47, 1-17.
Livingston SA, Lewis C (1995). “Estimating the Consistency and Accuracy of Classifications Based on Test Scores.” Journal of Educational Measurement, 32(2), 179–197.
Mislevy, R. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195 Seong, T. J., Kim, S. H. & Cohen, A. S. (1997, March). A comparison of procedures for ability estimation under the graded response model. Paper presented at the Annual Meeting of the American Educational Research Association. Chicago
Pallant, J. (2007). SPSS Survival Manual: A Step by Step Guide to Data Analysis using SPSS for Windows. New York: Open University Press.
Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment Research & Evaluation, 7(14). Available online: http://pareonline.net/getvn.asp?v=7&n=14
Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment Research & Evaluation, 10(13). Available online: http://pareonline.net/getvn.asp?v=10&n=13.
Sireci, S. G., Robin, F., & Patelis, T. (1999). Using cluster analysis to facilitate standard setting. Applied Measurement in Education, 12, 301-325.
Subkoviak, M. J. (1976). Estimating reliability from a single administration of a criterion-referenced test. Journal of Educational Measurement, 13, 265-276.
Swaminathan H, Hambleton RK, Algina J (1974). “Reliability of Criterion-Referenced Tests: A Decision-Theoretic Formulation.” Journal of Educational Measurement, 11(4), 263–267.
Yang, X., Poggio, J. C., &Glasnapp, D. R. (2006). Effects of estimation bias on multiple-category classificationwith an IRT-baesd adaptive classification procedure. Educational and Psychological Measurement,31, 275-291.
Wyse, A. E., & Hao, S. (2012). An evaluation of item response theory classification accuracy and consistency indices. Applied Psychological Measurement, 36, 602-624.
Zhang, S., Du, J., Chen, P., Xin, T., & Chen, F. (2017). Using Procedure Based on Item Response Theory to Evaluate Classification Consistency Indices in the Practice of Large-Scale Assessment. Frontiers in Psychology, 8, 1676. http://doi.org/10.3389/fpsyg.2017.01676

Toplam 28 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Bölüm	Makaleler
Yazarlar	Serpil Çelikten 0000-0003-3868-3807 Mehtap Çakan 0000-0001-6602-6180
Yayımlanma Tarihi	30 Haziran 2019
Gönderilme Tarihi	16 Mayıs 2019
Yayımlandığı Sayı	Yıl 2019 Cilt: 13 Sayı: 1

Kaynak Göster

APA	Çelikten, S., & Çakan, M. (2019). BAYESIAN VE NONBAYESIAN KESTİRİM YÖNTEMLERİNE DAYALI OLARAK SINIFLAMA İNDEKSLERİNİN TIMSS-2015 MATEMATİK TESTİ ÜZERİNDE İNCELENMESİ. Necatibey Faculty of Education Electronic Journal of Science and Mathematics Education, 13(1), 105-124. https://doi.org/10.17522/balikesirnef.566446

Cited By

Drawing a Sample with Desired Properties from Population in R Package “drawsample”

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.790449

Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods

International Journal of Assessment Tools in Education

https://doi.org/10.21449/ijate.1290831

Kapak Resmi İndir

Makale Dosyaları

Tam Metin