The effect of rater training on rating behaviors in peer assessment among secondary school students

Naizra Tursynbayeva; Umur Öç; İsmail Karakaya

doi:10.21449/ijate.1438798

TR EN

The effect of rater training on rating behaviors in peer assessment among secondary school students

Abstract

This study aimed to measure the effect of rater training given to improve the peer assessment skills of secondary school students on rater behaviors using the many-facet Rasch Measurement model. The research employed a single-group pretest-posttest design. Since all raters scored all students, the analyses were carried out in a fully crossed (s x r x c) pattern. There were three facets in the research: student, rater, and criteria. The study group consisted of 25 seventh-grade students at a public school in Ankara in the 2021-2022 academic year. All 25 students in the study group were instructed to write compositions. The compositions were examined by the researchers, and 10 were selected for peer assessment. Before the experiment, students were asked to evaluate their peers’ writing skills according to the rubric developed by the researchers. Then, rater training was given to the students for four weeks. After the rater training, the students were instructed to re-evaluate the writing skills of their peers. In the research, four rater behaviors were examined: rater severity, rater leniency, differentiated rater severity, and differentiated rater leniency. When the research results were examined, it was observed that rater training contributed to reducing severity, leniency, and differentiated severity and leniency behaviors.

Keywords

The effect of rater training on rating behaviors in peer assessment among secondary school students

Abstract

This study aimed to measure the effect of rater training given to improve the peer assessment skills of secondary school students on rater behaviors using the many-facet Rasch Measurement model. The research employed a single-group pretest-posttest design. Since all raters scored all students, the analyses were carried out in a fully crossed (s x r x c) pattern. There were three facets in the research: student, rater, and criteria. The study group consisted of 25 seventh-grade students at a public school in Ankara in the 2021-2022 academic year. All 25 students in the study group were instructed to write compositions. The compositions were examined by the researchers, and 10 were selected for peer assessment. Before the experiment, students were asked to evaluate their peers’ writing skills according to the rubric developed by the researchers. Then, rater training was given to the students for four weeks. After the rater training, the students were instructed to re-evaluate the writing skills of their peers. In the research, four rater behaviors were examined: rater severity, rater leniency, differentiated rater severity, and differentiated rater leniency. When the research results were examined, it was observed that rater training contributed to reducing severity, leniency, and differentiated severity and leniency behaviors.

Keywords

Ethical Statement

The authors declare no conflict of interest. This research study complies with research publishing ethics. The scientific and legal responsibility for manuscripts published in IJATE belongs to the authors. Ethics Committee Number: Gazi University, 22/04/2022, E-77082166-604.01.02-345966 A part of this study was presented at the 8th International Congress on Measurement and Evaluation in Education and Psychology. EGE University, İzmir 21-23 September, Turkey.

References

Alıcı, D. (2010). Öğrenci Performansının Değerlendirilmesinde Kullanılan Diğer Ölçme Araç ve Yöntemleri [Other Measurement Tools and Methods Used in the Evaluation of Student Performance (pp. 127-168), Measurement and Evaluation in Education]. Ankara: Pegem Akademi Yayıncılık
Andrade, H. G. (2005). Teaching With Rubrics: The Good, the Bad, and the Ugly. College Teaching, 53(1), 27-31. https://doi.org/10.3200/CTCH.53.1.27-31
Berg, E.C. (1999). The effects of trained peer response on ESL students' revision types and writing quality. Journal of Second Language Writing, 8(3), 215 241. https://doi.org/https://doi.org/10.1016/S1060-3743(99)80115-5
Bijani, H. (2018). Investigating the validity of oral assessment rater training program: A mixed-methods study of raters’ perceptions and attitudes before and after training. Cogent Education, 5(1), 1460901. https://doi.org/10.1080/2331186X.2018.1460901
Boud, D., Cohen, R., & Sampson, J. (1999). Peer Learning and Assessment. Assessment & Evaluation in Higher Education, 24(4), 413 426. https://doi.org/10.1080/0260293990240405
Bushell, G. (2006). Moderation of peer assessment in group projects. Assessment & Evaluation in Higher Education, 31(1), 91-108. https://doi.org/10.1080/02602930500262395
Büyüköztürk, Ş. (2002). Faktör analizi: Temel kavramlar ve ölçek geliştirmede kullanımı [Factor Analysis: Basic Concepts and its Use in Scale Development]. Eğitim Yönetimi: Teori ve Uygulama, 32(32), 470-483
Congdon, P.J., & MeQueen, J. (2000). The Stability of Rater Severity in Large-Scale Assessment Programs. Journal of Educational Measurement, 37(2), 163 178. https://doi.org/https://doi.org/10.1111/j.1745-3984.2000.tb01081.x

Çeçen, M. (2011). Türkçe Öğretmenlerinin Seviye Belirleme Sınavı ve Türkçe Sorularına İlişkin Görüşleri [Turkish Language Teachers' Views About Level Determination Exam and Turkish Lesson Questions]. Mustafa Kemal Üniversitesi Sosyal Bilimler Enstitüsü Dergisi 8(15), 201-212. https://dergipark.org.tr/en/pub/mkusbed/issue/19555/208689
Çokluk, Ö., Şekercioğlu, G., & Büyüköztürk, Ş. (2021). Sosyal bilimler için çok değişkenli istatistik: SPSS ve LISREL uygulamaları [Multivariate Statistical SPSS and LISREL Applications for Social Sciences] (6 ed.). Pegem Akademi Yayıncılık https://doi.org/10.14527/9786055885670
Donnon, T., McIlwrick, J., & Woloschuk, W. (2013). Investigating the Reliability and Validity of Self and Peer Assessment to Measure Medical Students¡¯ Professional Competencies. Creative Education, 4(6), Article 32932. https://doi.org/10.4236/ce.2013.46A005
Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang.
Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual Feedback to Enhance Rater Training: Does It Work? Language Assessment Quarterly, 2(3), 175-196. https://doi.org/10.1207/s15434311laq0203_1
Ellington, H., Earl, S., & Cowan, J. (1997). Making effective use of peer and self assessment. Innovations in Education and Training International, 32, 175-178.
Engelhard, G. (1994). Examining Rater Errors in the Assessment of Written Composition with a Many-Faceted Rasch Model. Journal of Educational Measurement, 31(2), 93-112. https://doi.org/https://doi.org/10.1111/j.1745-3984.1994.tb00436.x
Ertürk, S. (1979). Program development in education (3rd Edition). Yelkentepe Publications.
Esfandiari, R., & Myford, C.M. (2013). Severity differences among self-assessors, peer-assessors, and teacher assessors rating EFL essays. Assessing Writing, 18(2), 111-131. https://doi.org/https://doi.org/10.1016/j.asw.2012.12.002
Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. International Journal of Language Testing, 1(1), 1-16.
Farh, J.-L., Cannella, A.A., & Bedeian, A.G. (1991). The Impact of Purpose on Rating Quality and User Acceptance. Group & Organization Studies, 16(4), 367 386. https://doi.org/10.1177/105960119101600403
Farrokhi, F., & Esfandiari, R. (2011). A Many-facet Rasch Model to Detect Halo Effect in Three Types of Raters. Theory & Practice in Language Studies, 1(11).
Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101.
Field, A. (2005). Reliability analysis. Discovering Statistics Using spss. 2nd Edition, Sage, London.
Greenan, K., Humphreys, P., & McIlveen, H. (1997). Developing transferable personal skills: part of the graduate toolkit. Education + Training, 39(2), 71 78. https://doi.org/10.1108/00400919710164161
Guadagnoli, E., & Velicer, W.F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265-275. https://doi.org/10.1037/0033-2909.103.2.265
Gürlen, E., Boztunç Öztürk, N., & Eminoğlu, E. (2019). Investigation of the Reliability of Teacher, Self and Peer Evaluations at Primary School Level Using Generalizability Theory. Journal of Measurement and Evaluation in Education and Psychology, 10(4), 406-421.
Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory (Vol. 2). Sage.
Hauenstein, N.M.A., & McCusker, M.E. (2017). Rater training: Understanding effects of training content, practice ratings, and feedback. International Journal of Selection and Assessment, 25(3), 253-266. https://doi.org/https://doi.org/10.1111/ijsa.12177
Heslin, P.A. (2005). Conceptualizing and evaluating career success. Journal of Organizational Behavior, 26(2), 113-136. https://doi.org/https://doi.org/10.1002/job.270
Hutcheson, G.D., & Sofroniou, N. (1999). The multivariate social scientist: Introductory statistics using generalized linear models. Sage.
İşman, A., & Eskicumalı, A. (2003). Eğitimde Planlama ve Değerlendirme [Planning and Evaluation in Education] (4th Edition). Istanbul: Değişim Yayınları
Johnson, C., & Smith, F. (1997). Assessment of a complex peer evaluation instrument for team learning and group processes. ACCOUNTING EDUCATION-GREENWICH-, 2, 21-40.
Karakaya, İ. (2015). Comparison of Self Peer and Instructor Assessments in the Portfolio Assessment by Using Many Facet Rasch Model. Journal of Education and Human Development, 4(2).
Keaten, J.A., & Richardson, M.E. (1993). A Field Investigation of Peer Assessment as Part of the Student Group Grading Process.
Knoch, U. (2011). Investigating the effectiveness of individualized feedback to rating behavior - a longitudinal study. Language Testing, 28(2), 179 200. https://doi.org/10.1177/0265532210384252
Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face to face training? Assessing Writing, 12(1), 26 43. https://doi.org/https://doi.org/10.1016/j.asw.2007.04.001
Kondo, Y. (2010). Examination of Rater Training Effect and Rater Eligibility in L2 Performance Assessment. Journal of Pan-Pacific Association of Applied Linguistics, 14(2), 1-23.
Kubiszyn, T., & Borich, G.D. (2024). Educational testing and measurement. John Wiley & Sons.
Kurudayioğlu, M., Şahin, Ç., & Çelik, G. (2008). Türkiye’de Uygulanan Türk Edebiyatı Programı’ndaki Ölçme ve Değerlendirme Boyutu Uygulamasının Değerlendirilmesi: Bir Durum Çalışması [Evaluation of the Application of Measurement and Evaluation Dimension in Turkish Literature Program Implemented in Turkey: A Case Study]. Ahi Evran University Kırşehir Eğitim Fakültesi Dergisi, 9(2), 91 101. https://dergipark.org.tr/en/pub/kefad/issue/59525/856034
Kutlu, Ö., Doğan, C.D., & Karakaya, İ. (2010). Öğrenci başarısının belirlenmesi performansa ve portfolyoya dayalı durum belirleme [Determining student achievement based on performance and portfolio assessment].Ankara: Pegem Akademi Yayıncılık
Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563-575. https://doi.org/https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Linacre, J.M. (2014). A user’s guide to FACETS: Rasch-model computer programs (Vol. 18). http://www.winsteps.com/manuals.htm
Linacre, J.M. (2023). Facets computer program for many-facet Rasch measurement. Winsteps.com.
Loignon, A.C., Woehr, D.J., Thomas, J.S., Loughry, M.L., Ohland, M.W., & Ferguson, D.M. (2017). Facilitating peer evaluation in team contexts: The impact of frame-of-reference rater training. Academy of Management Learning & Education, 16(4), 562-578. https://doi.org/10.5465/amle.2016.0163
Lumley, T., & McNamara, T.F. (1995). Rater characteristics and rater bias: implications for training. Language Testing, 12(1), 54-71. https://doi.org/10.1177/026553229501200104
Lunt, H., Morton, J., & Wigglesworth, G. (1994). Rater behaviour in performance testing: Evaluating the effect of bias feedback. 19th annual congress of Applied Linguistics Association of Australia: University of Melbourne. July,
Martin, C.C., & Locke, K.D. (2022). What Do Peer Evaluations Represent? A Study of Rater Consensus and Target Personality [Brief Research Report]. Frontiers in Education, 7. https://doi.org/10.3389/feduc.2022.746457
May, G.L. (2008). The Effect of Rater Training on Reducing Social Style Bias in Peer Evaluation. Business Communication Quarterly, 71(3), 297 313. https://doi.org/10.1177/1080569908321431
Myford, C.M., & Wolfe, E.W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of applied measurement, 4(4), 386-422.
Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of applied measurement, 5(2), 189-227.
O’Sullivan, B., & Rignall, M. (2007). Assessing the value of bias analysis feedback to raters for the IELTS writing module. IELTS Collected Papers: Research in speaking and writing assessment, 446-478.
Oosterhof, A. (1999). Developing and using classroom assessments. ERIC.
Patri, M. (2002). The influence of peer feedback on self- and peer-assessment of oral skills. Language Testing, 19(2), 109-131. https://doi.org/10.1191/0265532202lt224oa
Saito, H. (2008). EFL classroom peer assessment: Training effects on rating and commenting. Language Testing, 25(4), 553-581. https://doi.org/10.1177/0265532208094276
Somervell, H. (1993). Issues in Assessment, Enterprise and Higher Education: the case for self‐peer and collaborative assessment. Assessment & Evaluation in Higher Education, 18(3), 221-233. https://doi.org/10.1080/0260293930180306
Stiggins, R., & Chappuis, J. (2005). Using Student-Involved Classroom Assessment to Close Achievement Gaps. Theory Into Practice, 44(1), 11 18. https://doi.org/10.1207/s15430421tip4401_3
Şata, M., Karakaya, İ., & Erman Aslanoğlu, A. (2020). Evaluation of University Students’ Rating Behaviors in Self and Peer Rating Process via Many Facet Rasch Model [Üniversite Öğrencilerinin Öz ve Akran Puanlama Sürecinde Puanlama Davranışlarının Many Facet Rasch Modeli ile İncelenmesi]. Eurasian Journal of Educational Research, 20(89), 25-46. https://dergipark.org.tr/en/pub/ejer/issue/57497/815802
Turgut, M.F., & Baykul, Y. (2010). Eğitimde ölçme ve değerlendirme [Measurement and evaluation in education] (Vol. 2). Ankara: Pegem Akademi Yayıncılık
Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305 319. https://doi.org/10.1177/026553229301000306
Woolfolk, A.E., Hoy, A.W., Hughes, M., & Walkup, V. (2008). Psychology in education. Pearson Education.
Yaşar, M. (2017). Ölçme ve değerlendirmenin önemi [The importance of measurement and evaluation]. Pegem Citation Index, 2-8.
Yeşilçınar, S., & Şata, M. (2021). Examining Rater Biases of Peer Assessors in Different Assessment Environments. International Journal of Psychology and Educational Studies, 8(4), 136-151. https://dergipark.org.tr/en/pub/pes/issue/65718/1020683

Details

Primary Language

English

Subjects

Classroom Measurement Practices, Measurement and Evaluation in Education (Other)

Journal Section

Research Article

Authors

Naizra Tursynbayeva
0000-0002-2165-3276
Kazakhstan

Umur Öç
0000-0002-1269-1115
Türkiye

İsmail Karakaya ^*
0000-0003-4308-6919
Türkiye

Early Pub Date

August 27, 2024

Publication Date

September 9, 2024

Submission Date

February 17, 2024

Acceptance Date

July 21, 2024

Published in Issue

Year 2024 Volume: 11 Number: 3

DOI

https://doi.org/10.21449/ijate.1438798

IZ

https://izlik.org/JA63YR25SD

Cite

RIS / Bibtex

APA

Tursynbayeva, N., Öç, U., & Karakaya, İ. (2024). The effect of rater training on rating behaviors in peer assessment among secondary school students. International Journal of Assessment Tools in Education, 11(3), 507-523. https://doi.org/10.21449/ijate.1438798

AMA

1.Tursynbayeva N, Öç U, Karakaya İ. The effect of rater training on rating behaviors in peer assessment among secondary school students. Int. J. Assess. Tools Educ. 2024;11(3):507-523. doi:10.21449/ijate.1438798

Chicago

Tursynbayeva, Naizra, Umur Öç, and İsmail Karakaya. 2024. “The Effect of Rater Training on Rating Behaviors in Peer Assessment Among Secondary School Students”. International Journal of Assessment Tools in Education 11 (3): 507-23. https://doi.org/10.21449/ijate.1438798.

EndNote

Tursynbayeva N, Öç U, Karakaya İ (September 1, 2024) The effect of rater training on rating behaviors in peer assessment among secondary school students. International Journal of Assessment Tools in Education 11 3 507–523.

IEEE

[1]N. Tursynbayeva, U. Öç, and İ. Karakaya, “The effect of rater training on rating behaviors in peer assessment among secondary school students”, Int. J. Assess. Tools Educ., vol. 11, no. 3, pp. 507–523, Sept. 2024, doi: 10.21449/ijate.1438798.

ISNAD

Tursynbayeva, Naizra - Öç, Umur - Karakaya, İsmail. “The Effect of Rater Training on Rating Behaviors in Peer Assessment Among Secondary School Students”. International Journal of Assessment Tools in Education 11/3 (September 1, 2024): 507-523. https://doi.org/10.21449/ijate.1438798.

JAMA

1.Tursynbayeva N, Öç U, Karakaya İ. The effect of rater training on rating behaviors in peer assessment among secondary school students. Int. J. Assess. Tools Educ. 2024;11:507–523.

MLA

Tursynbayeva, Naizra, et al. “The Effect of Rater Training on Rating Behaviors in Peer Assessment Among Secondary School Students”. International Journal of Assessment Tools in Education, vol. 11, no. 3, Sept. 2024, pp. 507-23, doi:10.21449/ijate.1438798.

Vancouver

1.Naizra Tursynbayeva, Umur Öç, İsmail Karakaya. The effect of rater training on rating behaviors in peer assessment among secondary school students. Int. J. Assess. Tools Educ. 2024 Sep. 1;11(3):507-23. doi:10.21449/ijate.1438798

Cited By

Açık Uçlu Maddelerin Puanlanmasında Dereceli Puanlama Anahtarı Türünün Puanlayıcı Davranışlarına Etkisi

Türk Eğitim Bilimleri Dergisi

https://doi.org/10.37217/tebd.1501178