Examination of the Reliability of the Measurements Regarding the Written Expression Skills According to Different Test Theories
Abstract
The aim of the study is to examine the reliability estimations of written expression skills analytical rubric based on the Classical Test Theory (CTT), Generalizability Theory (GT) and Item Response Theory (IRT) which differ in their field of study. In this descriptive study, the stories of the 523 students in the study group were scored by seven raters. CTT results showed that Eta coefficient revealed that there was no difference between the scoring of the raters (η = .926); Cronbach Alpha coefficients were over .88. GT results showed that G and Phi coefficients were over .97. The students’ expected differentiation emerged, the difficulty levels of the criteria did not change from one student to another, and the consistency between the scores among raters was excellent. In the Item Response Theory, parameters were estimated according to Samejima’s (1969) Graded Response Model and item discrimination differed according to the different raters. According to b parameters, for all the raters; individuals are expected to be at least -2.35, -0.80, 0.41 ability level in order to be scored higher than 0, 1 or 2 categories respectively with .50 probability. Marginal reliability coefficients were quite high (around .93). The Fisher Z’ statistic was calculated for the significance of the difference between all reliability estimates. GT revealed more detailed information than CTT in the explanation of error variance sources and determination of reliability; while IRT provided more detailed information than CTT in determining the item-level error estimations and the ability level. There was a significant difference between the estimated parameters of CTT and GT in interrater reliability (p < .05); there was no significant difference between the parameters predicted according to CTT and IRT (p > .05).
Keywords
References
- Arsan, N. (2012). Buz pateninde hakem değerlendirmelerinin genellenebilirlik kuramı ve Rasch modeli ile incelenmesi. Doktora Tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
- Atılgan, H. (2005). Genellenebilirlik Kuramı ve Puanlayıcılar Arası Güvenirlik İçin Örnek Bir Uygulama. Eğitim Bilimleri ve Uygulama, 4(7), 95-108.
- Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. USA: The Guildford Press.
- Bağcı, V. (2015). Matematiksel Muhakeme Becerisinin Ölçülmesinde Klasik Test Kuramı ile Genellenebilirlik Kuramındaki Farklı Desenlerin Karşılaştırılması (Yüksek Lisans Tezi). Yüksek Lisans Tezi, Gazi Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
- Baker, F. B. (2016). Madde Tepki Kuramının Temelleri. (N. Güler, Dü., & M. İlhan, Çev.) Ankara: Pegem Akademi.
- Baykul, Y. (2010). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. Ankara: Pegem Akademi.
- Brennan, R. L. (2011). Generalizability theory and classical test theory. Applied Measurement And Education, 24, 1-21.
- Büyükkıdık, S. (2012). Problem çözme becerisinin değerlendirilmesinde puanlayıcılar arası güvenirliğin klasik test kuramı ve genellenebilirlik kuramına göre karşılaştırılması. Yüksek Lisans Tezi, Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü, Ankara.
Details
Primary Language
English
Subjects
-
Journal Section
Research Article
Authors
Şeref Tan
0000-0002-9892-3369
Türkiye
Publication Date
September 4, 2019
Submission Date
April 30, 2019
Acceptance Date
July 22, 2019
Published in Issue
Year 2019 Volume: 10 Number: 3