Araştırma Makalesi

Assessing the Reliability of Open-Ended Exams: A Generalizability Theory Approach to Item and Rater Variance

Cilt: 12 Sayı: 24 24 Ekim 2025
PDF İndir
TR EN

Assessing the Reliability of Open-Ended Exams: A Generalizability Theory Approach to Item and Rater Variance

Abstract

This study examines the reliability of open-ended university exams through the lens of Generalizability Theory (GT), aiming to identify key sources of measurement error. Using a fully crossed person × item × rater (p × i × r) design, a five-item written exam administered to 76 students was scored by two raters. The Generalizability Study (G-Study) revealed that the largest portion of total score variance stemmed from individual student differences (62.2%) and the person × item interaction (30.7%), while item-related (3.9%) and rater-related (1.5%) variance components were relatively minor. These results suggest that the exam effectively captures individual performance differences, and that increasing item coverage may significantly reduce measurement error. Findings from the Decision Study (D-Study) indicated that expanding the number of items from 4 to 10 and raters from 1 to 5 led to substantial improvements in both relative (σ²δ) and absolute (σ²Δ) error variances. Correspondingly, generalizability and Phi coefficients increased from 0.81 to 0.95. The low rater variance implies that the use of detailed scoring rubrics and rater training contributed to consistent scoring. Moreover, residual error was minimal (1.6%), suggesting strong model fit. From a practical standpoint, results recommend increasing item count to at least eight and involving at least three raters to optimize reliability. The study demonstrates the effectiveness of GT in dissecting multiple sources of error and offers guidance for improving assessment quality in higher education. Emphasizing item diversity, rater standardization, and data-informed decision-making can strengthen the validity and fairness of exam-based evaluations.

Keywords

Kaynakça

  1. Atılgan, H. (2005). A sample application for the theory of generalizability and inter-rater reliability. Educational Sciences and Practice, 4(7), 95–108.,
  2. Baker, F. B. (2001). The basics of item response theory (2nd ed.). ERIC Clearinghouse on Assessment and Evaluation.
  3. Baykul, Y. (2021). Measurement in Education and Psychology: Classical Test Theory and Practice. Pegem Akademi Publishing.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

24 Ekim 2025

Gönderilme Tarihi

12 Temmuz 2025

Kabul Tarihi

29 Eylül 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 12 Sayı: 24

Kaynak Göster

APA
Köroğlu, M. (2025). Assessing the Reliability of Open-Ended Exams: A Generalizability Theory Approach to Item and Rater Variance. İnönü Üniversitesi Eğitim Bilimleri Enstitüsü Dergisi, 12(24), 56-69. https://doi.org/10.29129/inujgse.1740879