Research Article

Comparison of Inter-Rater Reliability Techniques in Performance-Based Assessment

Volume: 9 Number: 2 June 26, 2022
TR EN

Comparison of Inter-Rater Reliability Techniques in Performance-Based Assessment

Abstract

The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of twenty individual investigation performance reports prepared by the learners of the International Baccalaureate Diploma Programme (IBDP) and also five raters who rated these reports. Raters used an analytical rubric developed by the International Baccalaureate Organization (IBO) as a scoring tool. The results of the CTT study show that Kappa and Krippendorff alpha statistical techniques failed to provide information about the sources of the errors causing incompatibility in the criteria. The studies based on G Theory provided comprehensive data about the sources of the errors and increasing the number of raters would also increase the reliability of the values. However, the raters raised the idea that it is important to develop descriptors in the criteria in the rubric.

Keywords

References

  1. Abedi, J., Baker, E.L., & Herl, H. (1995). Comparing reliability indices obtained by different approaches for performance assessments (CSE Report 401). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST). https://cresst.org/wp-content/uploads/TECH401.pdf
  2. Agresti, A. (2013). Categorical data analysis (3rd ed.). John Wiley & Sons.
  3. Airasian, P.W. (1994). Classroom assessment (2nd ed.). McGraw-Hill.
  4. Aktaş, M. & Alıcı, D. (2017). Kontrol listesi, analitik rubrik ve dereceleme ölçeklerinde puanlayıcı güvenirliğinin genellenebilirlik kuramına göre incelenmesi [Examination of scoring reliability according to generalizability theory in checklist, analytic rubric, and rating scales]. International Journal of Eurasia Social Sciences, 8(29), 991-1010.
  5. Anadol, H.Ö., & Doğan, C.D. (2018). Dereceli puanlama anahtarlarının güvenirliğinin farklı deneyim yıllarına sahip puanlayıcıların kullanıldığı durumlarda i̇ncelenmesi [The examination of realiability of scoring rubrics regarding raters with different experience years]. İlköğretim Online, 1066-1076. https://doi.org/10.17051/ilkonline.2018.419355
  6. Ananiadou, K., & Claro, M. (2009), 21st century skills and competences for new millennium learners in OECD countries. OECD Education Working Papers, 41. OECD Publishing, Paris,https://doi.org/10.1787/218525261154
  7. Atılgan, H.E., (2005). Genellenebilirlik kuramı ve puanlayıcılar arası güvenirlik için örnek bir uygulama [Generalizability theory and a sample application for inter-rater reliability]. Educational Sciences and Practice, 4(7), 95-108. http://ebuline.com/pdfs/7Sayi/7_6.pdf
  8. Atılgan, H., Kan, A., & Doğan, N. (2007). Eğitimde ölçme ve değerlendirme [Assessment and evaluation in an education] (2nd ed.). Anı Yayıncılık.

Details

Primary Language

English

Subjects

Studies on Education

Journal Section

Research Article

Publication Date

June 26, 2022

Submission Date

September 10, 2021

Acceptance Date

May 17, 2022

Published in Issue

Year 2022 Volume: 9 Number: 2

APA
Arslan Mancar, S., & Gülleroğlu, H. D. (2022). Comparison of Inter-Rater Reliability Techniques in Performance-Based Assessment. International Journal of Assessment Tools in Education, 9(2), 515-533. https://doi.org/10.21449/ijate.993805

Cited By

23823             23825             23824