Açık Uçlu Maddelerin Puanlanmasında Dereceli Puanlama Anahtarı Türünün Puanlayıcı Davranışlarına Etkisi

Umur Öç; Esra Onkun Özgür; İsmail Karakaya

doi:10.37217/tebd.1501178

Araştırma Makalesi

BibTex

RIS

Kaynak Göster

The Effect of the Type of Rubric on Rater Behavior in Scoring Open-Ended Items

Yıl 2024, Cilt: 22 Sayı: 3, 2123 - 2151, 31.12.2024

Umur Öç , Esra Onkun Özgür , İsmail Karakaya

https://doi.org/10.37217/tebd.1501178

Öz

This research aims to examine the effects of using analytical and holistic rubrics in scoring a mathematics achievement test containing non-routine open-ended mathematics items on rater behaviors using the many-facet Rasch model. The study group of the research consists of 20 eighth-grade students studying in a public school to whom the achievement test consisting of open-ended non-routine mathematics problems was applied and 16 mathematics teachers who evaluated the answered achievement test. In this study, the survey model, one of the descriptive research methods, was used. In this study, the achievement test consisting of 15 different non-routine open-ended mathematics problems prepared by Onkun-Özgür (2024) was applied to the students in two different sessions and on two days. The data obtained from the raters were evaluated using the many-facet Rasch model. In the study, the central tendency, bias and halo effect behaviors of the raters were examined. When the findings of the study were examined, it was seen that model data fit was provided in all scorings on the rater, individual and item surfaces, while raters who used holistic rubrics had less rater effects than raters who used analytical rubrics.

Anahtar Kelimeler

Analytical rubric , Holistic rubric , Rater behaviors , Rater bias , Many facet Rasch model

Kaynakça

Abu Kassim, N. L. (2007). Exploring rater judging behaviour using the many-facet Rasch model. The Second Biennial International Conference on Teaching and Learning of English in Asia:Exploring New Frontiers’da (TELiA2) sunulmuş bildiri, Universiti Utara, Malaysia. https://repo.uum.edu.my/id/eprint/3212/ sayfasından erişilmiştir.
Altun, M. (2020). Matematik okuryazarlığı el kitabı. Aktüel Alfa Akademi.
Anderson, L. W. & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives: Complete edition. Addison Wesley Longman.
Aslanoğlu, A. E. (2022). Açık uçlu maddelerin hazırlanması ve incelenmesi. İ. Karakaya (Ed.), Açık uçlu soruların hazırlanması, uygulanması ve değerlendirilmesi (1. b.) içinde (s. 2-27). Pegem Akademi.
Atılgan, H. (2009). Test geliştirme. H. Atılgan, A. Kan, & N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (4. b.) içinde (s. 315-348). Anı.
Baird, J., Hayes, M., Johnson, R., Johnson, S., & Lamprianou, I. (2013). Marker effects and examination reliability: A comparative exploration from the perspectives of generalizability theory, Rasch modelling and multilevel modelling (Research Report No. 5261). http://dera.ioe.ac.uk/17683/1/2013-01-21-marker-effects-and-examination-reliability.pdf sayfasından erişilmiştir.
Baker, F. B. & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. Marcel Dekker.
Bıkmaz-Bilgen, Ö. & Doğan, N. (2017). Puanlayıcılar arası güvenirlik belirleme tekniklerinin karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 8(1), 63-78. https://doi.org/10.21031/epod.294847
Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Cengage Learning.
Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58(4), 438-481. https://doi.org/10.3102/00346543058004438
DeMars, C. (2010). Item response theory. Oxford University.
Dunbar, N. E., Brooks, C. F., & Kubicka-Miller, T. (2006). Oral communication skills in higher education: Using a performance-based evaluation rubric to assess communication skills. Innovative Higher Education, 31(2), 115-128. https://doi.org/10.1007/s10755-006-9012-x
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang.
Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Multivariate Applications Books Series. Lawrence Erlbaum.
Esfandiari, R. (2015). Rater errors among peer-assessors: Applying the many-facet Rasch measurement model. Iranian Journal of Applied Linguistics, 18(2), 77-107. https://doi.org/10.18869/acadpub.ijal.18.2.77
Esfandiari, R. (2021). Rater-mediated assessment of Iranian undergraduate students’ college essays: Many-facet Rasch modelling. Journal of Applied Linguistics and Applied Literature: Dynamics and Advances, 9(1), 93-119. https://doi.org/10.22049/jalda.2021.27032.1234
Farrokhi, F., Esfandiari, R., & Dalili, M. V. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83.
Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101.
Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Allyn and Bacon.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-333. https://doi.org/10.1207/S15324818AME1503_5
Haladyna, T. M. & Rodriguez, M. C. (2013). Developing and validating test items. Taylor & Francis.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
Harvey, R. J. & Hammer, A. L. (1999). Item response theory. The Counseling Psychologist, 27(3), 353-383.
Jones, E. & Bergin, C. (2019). Evaluating teacher effectiveness using classroom observations: A Rasch analysis of the rater effects of principals. Educational Assessment, 24(2), 91-118.
Jonsson, A. & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130-144. https://doi.org/10.1016/j.edurev.2007.05.002
Karakaya, İ. (2012). Bilimsel araştırma yöntemleri. A. Tanrıöğen (Ed), Bilimsel araştırma yöntemleri içinde (s. 57-83). Anı.
Karakaya, İ. & Şata, M. (2022). Açık uçlu maddelerin hazırlanması ve incelenmesi. İ. Karakaya (Ed.), Açık uçlu soruların hazırlanması, uygulanması ve değerlendirilmesi (1. b.) içinde (s. 28-39). Pegem Akademi.
Kilpatrick, J. & Lerman, S. (2020). Education of professional development providers (for educators of practicing teachers). S. Lerman (Ed.), Encyclopedia of mathematics education içinde (s. 262-262). Springer International.
Kutlu, Ö., Doğan, D. C., & Karakaya, İ. (2017). Ölçme ve değerlendirme performansa ve portfolyoya dayalı durum belirleme. Pegem Akademi.
Linacre, J. M. (1989). Many-faceted Rasch measurement. (Doktora Tezi). ProQuest Dissertations & Theses Global database. (T-30889)
Linacre, J. M. (2014). A user’s guide to FACETS: Rasch-model computer programs. Winsteps.
Linacre, J. M. (2023). Facets computer program for many-facet Rasch measurement [version 3.85.1].
Linlin, C. (2020). Comparison of automatic and expert teachers' rating of computerized English listening-speaking test. English Language Teaching, 13(1), 18-30.
McNamara, T. F. (1996). Measuring second language performance. Harlow: Longman.
Moskal, B. M. & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research, and Evaluation, 7(10), 71-81. https://doi.org/10.7275/q7rm-gg74
Myford, C. M. & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Nitko, A. J. & Brookhart, S. M. (2014). Educational assessment of students (6. b.). Pearson Education.
Onkun-Özgür, E. (2024). Dereceli puanlama anahtarı türünün rutin olmayan matematik problemlerinin puanlanmasında puanlayıcı davranışları üzerine etkisinin incelenmesi (Yüksek Lisans Tezi). https://tez.yok.gov.tr sayfasından erişilmiştir.
Popham, W. J. (2001). Clasroom assesment: What teachers need to know. Allyn and Bacon.
Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Principles and Standards for School Mathematics (NCTM), 94(1), 22.
Royal, K. D. & Hecker, K. G. (2016). Rater errors in clinical performance assessments. Journal of Veterinary Medical Education, 43(1), 5-8.
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428. https://doi.org/10.1037/0033-2909.88.2.413
Şata, M. (2019). Performans değerlendirme sürecinde puanlayıcı eğitiminin puanlayıcı davranışları üzerindeki etkisinin incelenmesi (Doktora Tezi). https://tez.yok.gov.tr sayfasından erişilmiştir.
Şata, M., Karakaya, İ., & Erman-Aslanoğlu, A. (2020). Evaluation of university students’ rating behaviors in self and peer rating process via many facet Rasch model. Eurasian Journal of Educational Research, 20(89), 25-46.
Tuckman, B. W. (1991). Evaluating the alternative to multiple-choice testing for teachers. Contemporary Education, 62(4), 299.
Tursynbayeva, N., Öç, U., & Karakaya, İ. (2024). The effect of rater training on rating behaviors in peer assessment among secondary school students. International Journal of Assessment Tools in Education, 11(3), 507-523. https://doi.org/10.21449/ijate.1438798
Wainer, H. & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103-118. https://doi.org/10.1207/s15324818ame0602_1
Yılmaz, F. N. (2017). Analysis of the rater effects in the rating of diagnostic trees prepared by teacher candidates by the many-facet Rasch model. JEP, 8(18).

Açık Uçlu Maddelerin Puanlanmasında Dereceli Puanlama Anahtarı Türünün Puanlayıcı Davranışlarına Etkisi

Yıl 2024, Cilt: 22 Sayı: 3, 2123 - 2151, 31.12.2024

Umur Öç , Esra Onkun Özgür , İsmail Karakaya

https://doi.org/10.37217/tebd.1501178

Öz

Bu araştırma, rutin olmayan açık uçlu matematik maddelerini içeren matematik başarı testinin puanlanmasında analitik ve bütünsel dereceli puanlama anahtarı kullanımının puanlayıcı davranışlarına etkisini çok yüzeyli Rasch modeli ile incelenmesini amaçlamaktadır. Araştırmanın çalışma grubunu, açık uçlu rutin olmayan matematik problemlerinden oluşan başarı testinin uygulandığı, devlet okulunda sekizinci sınıfta öğrenim gören 20 öğrenci ve cevaplanan başarı testini değerlendiren 16 matematik öğretmeni oluşturmaktadır. Bu çalışmada, betimsel araştırma yöntemlerinden tarama modeli kullanılmıştır. Bu çalışmada, Onkun-Özgür (2024) tarafından hazırlanmış, 15 farklı rutin olmayan açık uçlu matematik probleminden oluşan başarı testi, iki farklı oturum şeklinde, iki günde öğrencilere uygulanmıştır. Puanlayıcılardan elde edilen veriler çok yüzeyli Rasch modeli ile değerlendirilmiştir. Çalışmada, puanlayıcılara ait merkeze eğilim, yanlılık ve halo etkisi davranışları incelenmiştir. Çalışmanın bulguları incelendiğinde yapılan tüm puanlamalarda puanlayıcı, birey ve madde yüzeylerinde model veri uyumunun sağlandığı; bununla birlikte, bütünsel dereceli puanlama anahtarı kullanan puanlayıcılarda, analitik dereceli puanlama anahtarı kullanan puanlayıcılara göre daha az puanlayıcı etkisi olduğu belirlenmiştir.

Anahtar Kelimeler

Analitik dereceli puanlama anahtarı , Bütünsel dereceli puanlama anahtarı , Puanlayıcı davranışları , Puanlayıcı yanlılığı , Çok yüzeyli Rasch modeli

Kaynakça

Abu Kassim, N. L. (2007). Exploring rater judging behaviour using the many-facet Rasch model. The Second Biennial International Conference on Teaching and Learning of English in Asia:Exploring New Frontiers’da (TELiA2) sunulmuş bildiri, Universiti Utara, Malaysia. https://repo.uum.edu.my/id/eprint/3212/ sayfasından erişilmiştir.
Altun, M. (2020). Matematik okuryazarlığı el kitabı. Aktüel Alfa Akademi.
Anderson, L. W. & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives: Complete edition. Addison Wesley Longman.
Aslanoğlu, A. E. (2022). Açık uçlu maddelerin hazırlanması ve incelenmesi. İ. Karakaya (Ed.), Açık uçlu soruların hazırlanması, uygulanması ve değerlendirilmesi (1. b.) içinde (s. 2-27). Pegem Akademi.
Atılgan, H. (2009). Test geliştirme. H. Atılgan, A. Kan, & N. Doğan (Ed.), Eğitimde ölçme ve değerlendirme (4. b.) içinde (s. 315-348). Anı.
Baird, J., Hayes, M., Johnson, R., Johnson, S., & Lamprianou, I. (2013). Marker effects and examination reliability: A comparative exploration from the perspectives of generalizability theory, Rasch modelling and multilevel modelling (Research Report No. 5261). http://dera.ioe.ac.uk/17683/1/2013-01-21-marker-effects-and-examination-reliability.pdf sayfasından erişilmiştir.
Baker, F. B. & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. Marcel Dekker.
Bıkmaz-Bilgen, Ö. & Doğan, N. (2017). Puanlayıcılar arası güvenirlik belirleme tekniklerinin karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 8(1), 63-78. https://doi.org/10.21031/epod.294847
Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Cengage Learning.
Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58(4), 438-481. https://doi.org/10.3102/00346543058004438
DeMars, C. (2010). Item response theory. Oxford University.
Dunbar, N. E., Brooks, C. F., & Kubicka-Miller, T. (2006). Oral communication skills in higher education: Using a performance-based evaluation rubric to assess communication skills. Innovative Higher Education, 31(2), 115-128. https://doi.org/10.1007/s10755-006-9012-x
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang.
Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Multivariate Applications Books Series. Lawrence Erlbaum.
Esfandiari, R. (2015). Rater errors among peer-assessors: Applying the many-facet Rasch measurement model. Iranian Journal of Applied Linguistics, 18(2), 77-107. https://doi.org/10.18869/acadpub.ijal.18.2.77
Esfandiari, R. (2021). Rater-mediated assessment of Iranian undergraduate students’ college essays: Many-facet Rasch modelling. Journal of Applied Linguistics and Applied Literature: Dynamics and Advances, 9(1), 93-119. https://doi.org/10.22049/jalda.2021.27032.1234
Farrokhi, F., Esfandiari, R., & Dalili, M. V. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83.
Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101.
Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Allyn and Bacon.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-333. https://doi.org/10.1207/S15324818AME1503_5
Haladyna, T. M. & Rodriguez, M. C. (2013). Developing and validating test items. Taylor & Francis.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
Harvey, R. J. & Hammer, A. L. (1999). Item response theory. The Counseling Psychologist, 27(3), 353-383.
Jones, E. & Bergin, C. (2019). Evaluating teacher effectiveness using classroom observations: A Rasch analysis of the rater effects of principals. Educational Assessment, 24(2), 91-118.
Jonsson, A. & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130-144. https://doi.org/10.1016/j.edurev.2007.05.002
Karakaya, İ. (2012). Bilimsel araştırma yöntemleri. A. Tanrıöğen (Ed), Bilimsel araştırma yöntemleri içinde (s. 57-83). Anı.
Karakaya, İ. & Şata, M. (2022). Açık uçlu maddelerin hazırlanması ve incelenmesi. İ. Karakaya (Ed.), Açık uçlu soruların hazırlanması, uygulanması ve değerlendirilmesi (1. b.) içinde (s. 28-39). Pegem Akademi.
Kilpatrick, J. & Lerman, S. (2020). Education of professional development providers (for educators of practicing teachers). S. Lerman (Ed.), Encyclopedia of mathematics education içinde (s. 262-262). Springer International.
Kutlu, Ö., Doğan, D. C., & Karakaya, İ. (2017). Ölçme ve değerlendirme performansa ve portfolyoya dayalı durum belirleme. Pegem Akademi.
Linacre, J. M. (1989). Many-faceted Rasch measurement. (Doktora Tezi). ProQuest Dissertations & Theses Global database. (T-30889)
Linacre, J. M. (2014). A user’s guide to FACETS: Rasch-model computer programs. Winsteps.
Linacre, J. M. (2023). Facets computer program for many-facet Rasch measurement [version 3.85.1].
Linlin, C. (2020). Comparison of automatic and expert teachers' rating of computerized English listening-speaking test. English Language Teaching, 13(1), 18-30.
McNamara, T. F. (1996). Measuring second language performance. Harlow: Longman.
Moskal, B. M. & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research, and Evaluation, 7(10), 71-81. https://doi.org/10.7275/q7rm-gg74
Myford, C. M. & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Nitko, A. J. & Brookhart, S. M. (2014). Educational assessment of students (6. b.). Pearson Education.
Onkun-Özgür, E. (2024). Dereceli puanlama anahtarı türünün rutin olmayan matematik problemlerinin puanlanmasında puanlayıcı davranışları üzerine etkisinin incelenmesi (Yüksek Lisans Tezi). https://tez.yok.gov.tr sayfasından erişilmiştir.
Popham, W. J. (2001). Clasroom assesment: What teachers need to know. Allyn and Bacon.
Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Principles and Standards for School Mathematics (NCTM), 94(1), 22.
Royal, K. D. & Hecker, K. G. (2016). Rater errors in clinical performance assessments. Journal of Veterinary Medical Education, 43(1), 5-8.
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428. https://doi.org/10.1037/0033-2909.88.2.413
Şata, M. (2019). Performans değerlendirme sürecinde puanlayıcı eğitiminin puanlayıcı davranışları üzerindeki etkisinin incelenmesi (Doktora Tezi). https://tez.yok.gov.tr sayfasından erişilmiştir.
Şata, M., Karakaya, İ., & Erman-Aslanoğlu, A. (2020). Evaluation of university students’ rating behaviors in self and peer rating process via many facet Rasch model. Eurasian Journal of Educational Research, 20(89), 25-46.
Tuckman, B. W. (1991). Evaluating the alternative to multiple-choice testing for teachers. Contemporary Education, 62(4), 299.
Tursynbayeva, N., Öç, U., & Karakaya, İ. (2024). The effect of rater training on rating behaviors in peer assessment among secondary school students. International Journal of Assessment Tools in Education, 11(3), 507-523. https://doi.org/10.21449/ijate.1438798
Wainer, H. & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103-118. https://doi.org/10.1207/s15324818ame0602_1
Yılmaz, F. N. (2017). Analysis of the rater effects in the rating of diagnostic trees prepared by teacher candidates by the many-facet Rasch model. JEP, 8(18).

Toplam 48 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Sınıfiçi Ölçme Uygulamaları
Bölüm	Makaleler
Yazarlar	Umur Öç 0000-0002-1269-1115 Esra Onkun Özgür 0000-0002-5179-8847 İsmail Karakaya 0000-0003-4308-6919
Erken Görünüm Tarihi	13 Aralık 2024
Yayımlanma Tarihi	31 Aralık 2024
Gönderilme Tarihi	1 Temmuz 2024
Kabul Tarihi	17 Ekim 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 22 Sayı: 3

Kaynak Göster

APA	Öç, U., Onkun Özgür, E., & Karakaya, İ. (2024). Açık Uçlu Maddelerin Puanlanmasında Dereceli Puanlama Anahtarı Türünün Puanlayıcı Davranışlarına Etkisi. Türk Eğitim Bilimleri Dergisi, 22(3), 2123-2151. https://doi.org/10.37217/tebd.1501178

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Türk Eğitim Bilimleri Dergisi Gazi Üniversitesi Rektörlüğü tarafından yayınlanmaktadır.