Araştırma Makalesi

Inter-Rater Reliability Analysis in Performance-Based Assessment: A Comparison of Generalizability Coefficients and Rater Consistency

Cilt: 7 Sayı: 2 30 Eylül 2025
PDF İndir
TR EN

Inter-Rater Reliability Analysis in Performance-Based Assessment: A Comparison of Generalizability Coefficients and Rater Consistency

Abstract

This study investigates the reliability of a performance-based assessment tool used to evaluate university students’ basic statistical skills within the framework of Generalizability Theory (GT). A total of 80 students from the Guidance and Psychological Counseling program participated in a two-hour examination consisting of 10 applied tasks. The tasks were scored independently by two raters using a detailed analytic rubric. The scores were analyzed using a fully crossed design (p × i × r), with variance components estimated via the maximum likelihood method, and 95% confidence intervals calculated using a cluster bootstrap procedure (1,000 resamples). Results showed that 50.2% of the total variance was attributable to students, 25.6% to items, and 16.6% to raters, while interaction terms remained at low levels. The initial relative generalizability coefficient was calculated as .98, and the absolute decision coefficient (Φ) was .81. When the number of items was increased to 15 and the number of raters to five, the Φ coefficient improved to .90, and absolute error variance decreased to .45. Findings indicated that true performance differences among students were strongly captured, although rater effects could not be completely eliminated. Expanding task coverage and increasing the number of raters were found to be effective strategies for reducing both absolute and relative error variances. The study supports the importance of rubric use, investment in rater training, a multi-task–multi-rater approach, and GT-based revision cycles in high-stakes performance assessments. The findings are expected to inform practical assessment strategies aimed at improving statistical literacy in teacher education programs. Additionally, it is recommended that the study be replicated with larger and more diverse samples across disciplines to enhance internal validity. Future directions may include implementing rater feedback cycles through online platforms and integrating rubric-supported scoring systems. Keywords: Generalizability theory, performance-based assessment, rater reliability

Keywords

Kaynakça

  1. Andersen, S. A. W., Nayahangan, L. J., Park, Y. S., & Konge, L. (2021). Use of Generalizability Theory for exploring reliability of and sources of variance in assessment of technical skills: A systematic review and meta-analysis. Academic Medicine, 96(11), 1609–1619. https://doi.org/10.1097/ACM.0000000000004150.
  2. Bacon, D. R. (2003). Assessing Learning Outcomes: A Comparison of Multiple-Choice and Short-Answer Questions in a Marketing Context. Journal of Marketing Education, 25(1), 31-36. https://doi.org/10.1177/0273475302250570
  3. Baykul, Y. (2000). Measurement in education and psychology: Classical test theory and its application. Ankara: ÖSYM.
  4. Brennan, R. L. (2000). Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24(4), 339–353. https://doi.org/10.1177/01466210022031796
  5. Brennan, R. L. (2001). Generalizability theory. Springer-Verlag Publishing. https://doi.org/10.1007/978-1-4757-3456-0
  6. Burry-Stock, J. A., Shaw, D. G., Laurie, C., & Chissom, B. S. (1996). Rater Agreement Indexes for Performance Assessment. Educational and Psychological Measurement, 56(2), 251-262. https://doi.org/10.1177/0013164496056002006
  7. Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104
  8. Cohen, J., Swerdlik, M. E., & Phillips, S. M. (1996). Psychological testing and assessment (4th ed.). California: Mayfield

Ayrıntılar

Birincil Dil

İngilizce

Konular

Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları

Bölüm

Araştırma Makalesi

Erken Görünüm Tarihi

30 Eylül 2025

Yayımlanma Tarihi

30 Eylül 2025

Gönderilme Tarihi

19 Temmuz 2025

Kabul Tarihi

23 Eylül 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 7 Sayı: 2

Kaynak Göster

APA
Köroğlu, M. (2025). Inter-Rater Reliability Analysis in Performance-Based Assessment: A Comparison of Generalizability Coefficients and Rater Consistency. Ahmet Keleşoğlu Eğitim Fakültesi Dergisi, 7(2), 218-234. https://doi.org/10.38151/akef.2025.158

289812580829733