The effect of rater training on rating behaviors in peer assessment among secondary school students
Year 2024,
, 507 - 523, 09.09.2024
Naizra Tursynbayeva
,
Umur Öç
,
İsmail Karakaya
Abstract
This study aimed to measure the effect of rater training given to improve the peer assessment skills of secondary school students on rater behaviors using the many-facet Rasch Measurement model. The research employed a single-group pretest-posttest design. Since all raters scored all students, the analyses were carried out in a fully crossed (s x r x c) pattern. There were three facets in the research: student, rater, and criteria. The study group consisted of 25 seventh-grade students at a public school in Ankara in the 2021-2022 academic year. All 25 students in the study group were instructed to write compositions. The compositions were examined by the researchers, and 10 were selected for peer assessment. Before the experiment, students were asked to evaluate their peers’ writing skills according to the rubric developed by the researchers. Then, rater training was given to the students for four weeks. After the rater training, the students were instructed to re-evaluate the writing skills of their peers. In the research, four rater behaviors were examined: rater severity, rater leniency, differentiated rater severity, and differentiated rater leniency. When the research results were examined, it was observed that rater training contributed to reducing severity, leniency, and differentiated severity and leniency behaviors.
Ethical Statement
The authors declare no conflict of interest. This research study complies with research publishing ethics. The scientific and legal responsibility for manuscripts published in IJATE belongs to the authors. Ethics Committee Number: Gazi University, 22/04/2022, E-77082166-604.01.02-345966
A part of this study was presented at the 8th International Congress on Measurement and Evaluation in Education and Psychology. EGE University, İzmir 21-23 September, Turkey.
References
- Alıcı, D. (2010). Öğrenci Performansının Değerlendirilmesinde Kullanılan Diğer Ölçme Araç ve Yöntemleri [Other Measurement Tools and Methods Used in the Evaluation of Student Performance (pp. 127-168), Measurement and Evaluation in Education]. Ankara: Pegem Akademi Yayıncılık
- Andrade, H. G. (2005). Teaching With Rubrics: The Good, the Bad, and the Ugly. College Teaching, 53(1), 27-31. https://doi.org/10.3200/CTCH.53.1.27-31
- Berg, E.C. (1999). The effects of trained peer response on ESL students' revision types and writing quality. Journal of Second Language Writing, 8(3), 215 241. https://doi.org/https://doi.org/10.1016/S1060-3743(99)80115-5
- Bijani, H. (2018). Investigating the validity of oral assessment rater training program: A mixed-methods study of raters’ perceptions and attitudes before and after training. Cogent Education, 5(1), 1460901. https://doi.org/10.1080/2331186X.2018.1460901
- Boud, D., Cohen, R., & Sampson, J. (1999). Peer Learning and Assessment. Assessment & Evaluation in Higher Education, 24(4), 413 426. https://doi.org/10.1080/0260293990240405
- Bushell, G. (2006). Moderation of peer assessment in group projects. Assessment & Evaluation in Higher Education, 31(1), 91-108. https://doi.org/10.1080/02602930500262395
- Büyüköztürk, Ş. (2002). Faktör analizi: Temel kavramlar ve ölçek geliştirmede kullanımı [Factor Analysis: Basic Concepts and its Use in Scale Development]. Eğitim Yönetimi: Teori ve Uygulama, 32(32), 470-483
- Congdon, P.J., & MeQueen, J. (2000). The Stability of Rater Severity in Large-Scale Assessment Programs. Journal of Educational Measurement, 37(2), 163 178. https://doi.org/https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
- Çeçen, M. (2011). Türkçe Öğretmenlerinin Seviye Belirleme Sınavı ve Türkçe Sorularına İlişkin Görüşleri [Turkish Language Teachers' Views About Level Determination Exam and Turkish Lesson Questions]. Mustafa Kemal Üniversitesi Sosyal Bilimler Enstitüsü Dergisi 8(15), 201-212. https://dergipark.org.tr/en/pub/mkusbed/issue/19555/208689
- Çokluk, Ö., Şekercioğlu, G., & Büyüköztürk, Ş. (2021). Sosyal bilimler için çok değişkenli istatistik: SPSS ve LISREL uygulamaları [Multivariate Statistical SPSS and LISREL Applications for Social Sciences] (6 ed.). Pegem Akademi Yayıncılık https://doi.org/10.14527/9786055885670
- Donnon, T., McIlwrick, J., & Woloschuk, W. (2013). Investigating the Reliability and Validity of Self and Peer Assessment to Measure Medical Students¡¯ Professional Competencies. Creative Education, 4(6), Article 32932. https://doi.org/10.4236/ce.2013.46A005
- Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang.
- Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual Feedback to Enhance Rater Training: Does It Work? Language Assessment Quarterly, 2(3), 175-196. https://doi.org/10.1207/s15434311laq0203_1
- Ellington, H., Earl, S., & Cowan, J. (1997). Making effective use of peer and self assessment. Innovations in Education and Training International, 32, 175-178.
- Engelhard, G. (1994). Examining Rater Errors in the Assessment of Written Composition with a Many-Faceted Rasch Model. Journal of Educational Measurement, 31(2), 93-112. https://doi.org/https://doi.org/10.1111/j.1745-3984.1994.tb00436.x
- Ertürk, S. (1979). Program development in education (3rd Edition). Yelkentepe Publications.
- Esfandiari, R., & Myford, C.M. (2013). Severity differences among self-assessors, peer-assessors, and teacher assessors rating EFL essays. Assessing Writing, 18(2), 111-131. https://doi.org/https://doi.org/10.1016/j.asw.2012.12.002
- Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. International Journal of Language Testing, 1(1), 1-16.
- Farh, J.-L., Cannella, A.A., & Bedeian, A.G. (1991). The Impact of Purpose on Rating Quality and User Acceptance. Group & Organization Studies, 16(4), 367 386. https://doi.org/10.1177/105960119101600403
- Farrokhi, F., & Esfandiari, R. (2011). A Many-facet Rasch Model to Detect Halo Effect in Three Types of Raters. Theory & Practice in Language Studies, 1(11).
- Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101.
- Field, A. (2005). Reliability analysis. Discovering Statistics Using spss. 2nd Edition, Sage, London.
- Greenan, K., Humphreys, P., & McIlveen, H. (1997). Developing transferable personal skills: part of the graduate toolkit. Education + Training, 39(2), 71 78. https://doi.org/10.1108/00400919710164161
- Guadagnoli, E., & Velicer, W.F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265-275. https://doi.org/10.1037/0033-2909.103.2.265
- Gürlen, E., Boztunç Öztürk, N., & Eminoğlu, E. (2019). Investigation of the Reliability of Teacher, Self and Peer Evaluations at Primary School Level Using Generalizability Theory. Journal of Measurement and Evaluation in Education and Psychology, 10(4), 406-421.
- Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
- Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory (Vol. 2). Sage.
- Hauenstein, N.M.A., & McCusker, M.E. (2017). Rater training: Understanding effects of training content, practice ratings, and feedback. International Journal of Selection and Assessment, 25(3), 253-266. https://doi.org/https://doi.org/10.1111/ijsa.12177
- Heslin, P.A. (2005). Conceptualizing and evaluating career success. Journal of Organizational Behavior, 26(2), 113-136. https://doi.org/https://doi.org/10.1002/job.270
- Hutcheson, G.D., & Sofroniou, N. (1999). The multivariate social scientist: Introductory statistics using generalized linear models. Sage.
- İşman, A., & Eskicumalı, A. (2003). Eğitimde Planlama ve Değerlendirme [Planning and Evaluation in Education] (4th Edition). Istanbul: Değişim Yayınları
- Johnson, C., & Smith, F. (1997). Assessment of a complex peer evaluation instrument for team learning and group processes. ACCOUNTING EDUCATION-GREENWICH-, 2, 21-40.
- Karakaya, İ. (2015). Comparison of Self Peer and Instructor Assessments in the Portfolio Assessment by Using Many Facet Rasch Model. Journal of Education and Human Development, 4(2).
- Keaten, J.A., & Richardson, M.E. (1993). A Field Investigation of Peer Assessment as Part of the Student Group Grading Process.
- Knoch, U. (2011). Investigating the effectiveness of individualized feedback to rating behavior - a longitudinal study. Language Testing, 28(2), 179 200. https://doi.org/10.1177/0265532210384252
- Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face to face training? Assessing Writing, 12(1), 26 43. https://doi.org/https://doi.org/10.1016/j.asw.2007.04.001
- Kondo, Y. (2010). Examination of Rater Training Effect and Rater Eligibility in L2 Performance Assessment. Journal of Pan-Pacific Association of Applied Linguistics, 14(2), 1-23.
- Kubiszyn, T., & Borich, G.D. (2024). Educational testing and measurement. John Wiley & Sons.
- Kurudayioğlu, M., Şahin, Ç., & Çelik, G. (2008). Türkiye’de Uygulanan Türk Edebiyatı Programı’ndaki Ölçme ve Değerlendirme Boyutu Uygulamasının Değerlendirilmesi: Bir Durum Çalışması [Evaluation of the Application of Measurement and Evaluation Dimension in Turkish Literature Program Implemented in Turkey: A Case Study]. Ahi Evran University Kırşehir Eğitim Fakültesi Dergisi, 9(2), 91 101. https://dergipark.org.tr/en/pub/kefad/issue/59525/856034
- Kutlu, Ö., Doğan, C.D., & Karakaya, İ. (2010). Öğrenci başarısının belirlenmesi performansa ve portfolyoya dayalı durum belirleme [Determining student achievement based on performance and portfolio assessment].Ankara: Pegem Akademi Yayıncılık
- Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563-575. https://doi.org/https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
- Linacre, J.M. (2014). A user’s guide to FACETS: Rasch-model computer programs (Vol. 18). http://www.winsteps.com/manuals.htm
- Linacre, J.M. (2023). Facets computer program for many-facet Rasch measurement. Winsteps.com.
- Loignon, A.C., Woehr, D.J., Thomas, J.S., Loughry, M.L., Ohland, M.W., & Ferguson, D.M. (2017). Facilitating peer evaluation in team contexts: The impact of frame-of-reference rater training. Academy of Management Learning & Education, 16(4), 562-578. https://doi.org/10.5465/amle.2016.0163
- Lumley, T., & McNamara, T.F. (1995). Rater characteristics and rater bias: implications for training. Language Testing, 12(1), 54-71. https://doi.org/10.1177/026553229501200104
- Lunt, H., Morton, J., & Wigglesworth, G. (1994). Rater behaviour in performance testing: Evaluating the effect of bias feedback. 19th annual congress of Applied Linguistics Association of Australia: University of Melbourne. July,
- Martin, C.C., & Locke, K.D. (2022). What Do Peer Evaluations Represent? A Study of Rater Consensus and Target Personality [Brief Research Report]. Frontiers in Education, 7. https://doi.org/10.3389/feduc.2022.746457
- May, G.L. (2008). The Effect of Rater Training on Reducing Social Style Bias in Peer Evaluation. Business Communication Quarterly, 71(3), 297 313. https://doi.org/10.1177/1080569908321431
- Myford, C.M., & Wolfe, E.W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of applied measurement, 4(4), 386-422.
- Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of applied measurement, 5(2), 189-227.
- O’Sullivan, B., & Rignall, M. (2007). Assessing the value of bias analysis feedback to raters for the IELTS writing module. IELTS Collected Papers: Research in speaking and writing assessment, 446-478.
- Oosterhof, A. (1999). Developing and using classroom assessments. ERIC.
- Patri, M. (2002). The influence of peer feedback on self- and peer-assessment of oral skills. Language Testing, 19(2), 109-131. https://doi.org/10.1191/0265532202lt224oa
- Saito, H. (2008). EFL classroom peer assessment: Training effects on rating and commenting. Language Testing, 25(4), 553-581. https://doi.org/10.1177/0265532208094276
- Somervell, H. (1993). Issues in Assessment, Enterprise and Higher Education: the case for self‐peer and collaborative assessment. Assessment & Evaluation in Higher Education, 18(3), 221-233. https://doi.org/10.1080/0260293930180306
- Stiggins, R., & Chappuis, J. (2005). Using Student-Involved Classroom Assessment to Close Achievement Gaps. Theory Into Practice, 44(1), 11 18. https://doi.org/10.1207/s15430421tip4401_3
- Şata, M., Karakaya, İ., & Erman Aslanoğlu, A. (2020). Evaluation of University Students’ Rating Behaviors in Self and Peer Rating Process via Many Facet Rasch Model [Üniversite Öğrencilerinin Öz ve Akran Puanlama Sürecinde Puanlama Davranışlarının Many Facet Rasch Modeli ile İncelenmesi]. Eurasian Journal of Educational Research, 20(89), 25-46. https://dergipark.org.tr/en/pub/ejer/issue/57497/815802
- Turgut, M.F., & Baykul, Y. (2010). Eğitimde ölçme ve değerlendirme [Measurement and evaluation in education] (Vol. 2). Ankara: Pegem Akademi Yayıncılık
- Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305 319. https://doi.org/10.1177/026553229301000306
- Woolfolk, A.E., Hoy, A.W., Hughes, M., & Walkup, V. (2008). Psychology in education. Pearson Education.
- Yaşar, M. (2017). Ölçme ve değerlendirmenin önemi [The importance of measurement and evaluation]. Pegem Citation Index, 2-8.
- Yeşilçınar, S., & Şata, M. (2021). Examining Rater Biases of Peer Assessors in Different Assessment Environments. International Journal of Psychology and Educational Studies, 8(4), 136-151. https://dergipark.org.tr/en/pub/pes/issue/65718/1020683
The effect of rater training on rating behaviors in peer assessment among secondary school students
Year 2024,
, 507 - 523, 09.09.2024
Naizra Tursynbayeva
,
Umur Öç
,
İsmail Karakaya
Abstract
This study aimed to measure the effect of rater training given to improve the peer assessment skills of secondary school students on rater behaviors using the many-facet Rasch Measurement model. The research employed a single-group pretest-posttest design. Since all raters scored all students, the analyses were carried out in a fully crossed (s x r x c) pattern. There were three facets in the research: student, rater, and criteria. The study group consisted of 25 seventh-grade students at a public school in Ankara in the 2021-2022 academic year. All 25 students in the study group were instructed to write compositions. The compositions were examined by the researchers, and 10 were selected for peer assessment. Before the experiment, students were asked to evaluate their peers’ writing skills according to the rubric developed by the researchers. Then, rater training was given to the students for four weeks. After the rater training, the students were instructed to re-evaluate the writing skills of their peers. In the research, four rater behaviors were examined: rater severity, rater leniency, differentiated rater severity, and differentiated rater leniency. When the research results were examined, it was observed that rater training contributed to reducing severity, leniency, and differentiated severity and leniency behaviors.
References
- Alıcı, D. (2010). Öğrenci Performansının Değerlendirilmesinde Kullanılan Diğer Ölçme Araç ve Yöntemleri [Other Measurement Tools and Methods Used in the Evaluation of Student Performance (pp. 127-168), Measurement and Evaluation in Education]. Ankara: Pegem Akademi Yayıncılık
- Andrade, H. G. (2005). Teaching With Rubrics: The Good, the Bad, and the Ugly. College Teaching, 53(1), 27-31. https://doi.org/10.3200/CTCH.53.1.27-31
- Berg, E.C. (1999). The effects of trained peer response on ESL students' revision types and writing quality. Journal of Second Language Writing, 8(3), 215 241. https://doi.org/https://doi.org/10.1016/S1060-3743(99)80115-5
- Bijani, H. (2018). Investigating the validity of oral assessment rater training program: A mixed-methods study of raters’ perceptions and attitudes before and after training. Cogent Education, 5(1), 1460901. https://doi.org/10.1080/2331186X.2018.1460901
- Boud, D., Cohen, R., & Sampson, J. (1999). Peer Learning and Assessment. Assessment & Evaluation in Higher Education, 24(4), 413 426. https://doi.org/10.1080/0260293990240405
- Bushell, G. (2006). Moderation of peer assessment in group projects. Assessment & Evaluation in Higher Education, 31(1), 91-108. https://doi.org/10.1080/02602930500262395
- Büyüköztürk, Ş. (2002). Faktör analizi: Temel kavramlar ve ölçek geliştirmede kullanımı [Factor Analysis: Basic Concepts and its Use in Scale Development]. Eğitim Yönetimi: Teori ve Uygulama, 32(32), 470-483
- Congdon, P.J., & MeQueen, J. (2000). The Stability of Rater Severity in Large-Scale Assessment Programs. Journal of Educational Measurement, 37(2), 163 178. https://doi.org/https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
- Çeçen, M. (2011). Türkçe Öğretmenlerinin Seviye Belirleme Sınavı ve Türkçe Sorularına İlişkin Görüşleri [Turkish Language Teachers' Views About Level Determination Exam and Turkish Lesson Questions]. Mustafa Kemal Üniversitesi Sosyal Bilimler Enstitüsü Dergisi 8(15), 201-212. https://dergipark.org.tr/en/pub/mkusbed/issue/19555/208689
- Çokluk, Ö., Şekercioğlu, G., & Büyüköztürk, Ş. (2021). Sosyal bilimler için çok değişkenli istatistik: SPSS ve LISREL uygulamaları [Multivariate Statistical SPSS and LISREL Applications for Social Sciences] (6 ed.). Pegem Akademi Yayıncılık https://doi.org/10.14527/9786055885670
- Donnon, T., McIlwrick, J., & Woloschuk, W. (2013). Investigating the Reliability and Validity of Self and Peer Assessment to Measure Medical Students¡¯ Professional Competencies. Creative Education, 4(6), Article 32932. https://doi.org/10.4236/ce.2013.46A005
- Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang.
- Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual Feedback to Enhance Rater Training: Does It Work? Language Assessment Quarterly, 2(3), 175-196. https://doi.org/10.1207/s15434311laq0203_1
- Ellington, H., Earl, S., & Cowan, J. (1997). Making effective use of peer and self assessment. Innovations in Education and Training International, 32, 175-178.
- Engelhard, G. (1994). Examining Rater Errors in the Assessment of Written Composition with a Many-Faceted Rasch Model. Journal of Educational Measurement, 31(2), 93-112. https://doi.org/https://doi.org/10.1111/j.1745-3984.1994.tb00436.x
- Ertürk, S. (1979). Program development in education (3rd Edition). Yelkentepe Publications.
- Esfandiari, R., & Myford, C.M. (2013). Severity differences among self-assessors, peer-assessors, and teacher assessors rating EFL essays. Assessing Writing, 18(2), 111-131. https://doi.org/https://doi.org/10.1016/j.asw.2012.12.002
- Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. International Journal of Language Testing, 1(1), 1-16.
- Farh, J.-L., Cannella, A.A., & Bedeian, A.G. (1991). The Impact of Purpose on Rating Quality and User Acceptance. Group & Organization Studies, 16(4), 367 386. https://doi.org/10.1177/105960119101600403
- Farrokhi, F., & Esfandiari, R. (2011). A Many-facet Rasch Model to Detect Halo Effect in Three Types of Raters. Theory & Practice in Language Studies, 1(11).
- Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101.
- Field, A. (2005). Reliability analysis. Discovering Statistics Using spss. 2nd Edition, Sage, London.
- Greenan, K., Humphreys, P., & McIlveen, H. (1997). Developing transferable personal skills: part of the graduate toolkit. Education + Training, 39(2), 71 78. https://doi.org/10.1108/00400919710164161
- Guadagnoli, E., & Velicer, W.F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265-275. https://doi.org/10.1037/0033-2909.103.2.265
- Gürlen, E., Boztunç Öztürk, N., & Eminoğlu, E. (2019). Investigation of the Reliability of Teacher, Self and Peer Evaluations at Primary School Level Using Generalizability Theory. Journal of Measurement and Evaluation in Education and Psychology, 10(4), 406-421.
- Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
- Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory (Vol. 2). Sage.
- Hauenstein, N.M.A., & McCusker, M.E. (2017). Rater training: Understanding effects of training content, practice ratings, and feedback. International Journal of Selection and Assessment, 25(3), 253-266. https://doi.org/https://doi.org/10.1111/ijsa.12177
- Heslin, P.A. (2005). Conceptualizing and evaluating career success. Journal of Organizational Behavior, 26(2), 113-136. https://doi.org/https://doi.org/10.1002/job.270
- Hutcheson, G.D., & Sofroniou, N. (1999). The multivariate social scientist: Introductory statistics using generalized linear models. Sage.
- İşman, A., & Eskicumalı, A. (2003). Eğitimde Planlama ve Değerlendirme [Planning and Evaluation in Education] (4th Edition). Istanbul: Değişim Yayınları
- Johnson, C., & Smith, F. (1997). Assessment of a complex peer evaluation instrument for team learning and group processes. ACCOUNTING EDUCATION-GREENWICH-, 2, 21-40.
- Karakaya, İ. (2015). Comparison of Self Peer and Instructor Assessments in the Portfolio Assessment by Using Many Facet Rasch Model. Journal of Education and Human Development, 4(2).
- Keaten, J.A., & Richardson, M.E. (1993). A Field Investigation of Peer Assessment as Part of the Student Group Grading Process.
- Knoch, U. (2011). Investigating the effectiveness of individualized feedback to rating behavior - a longitudinal study. Language Testing, 28(2), 179 200. https://doi.org/10.1177/0265532210384252
- Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face to face training? Assessing Writing, 12(1), 26 43. https://doi.org/https://doi.org/10.1016/j.asw.2007.04.001
- Kondo, Y. (2010). Examination of Rater Training Effect and Rater Eligibility in L2 Performance Assessment. Journal of Pan-Pacific Association of Applied Linguistics, 14(2), 1-23.
- Kubiszyn, T., & Borich, G.D. (2024). Educational testing and measurement. John Wiley & Sons.
- Kurudayioğlu, M., Şahin, Ç., & Çelik, G. (2008). Türkiye’de Uygulanan Türk Edebiyatı Programı’ndaki Ölçme ve Değerlendirme Boyutu Uygulamasının Değerlendirilmesi: Bir Durum Çalışması [Evaluation of the Application of Measurement and Evaluation Dimension in Turkish Literature Program Implemented in Turkey: A Case Study]. Ahi Evran University Kırşehir Eğitim Fakültesi Dergisi, 9(2), 91 101. https://dergipark.org.tr/en/pub/kefad/issue/59525/856034
- Kutlu, Ö., Doğan, C.D., & Karakaya, İ. (2010). Öğrenci başarısının belirlenmesi performansa ve portfolyoya dayalı durum belirleme [Determining student achievement based on performance and portfolio assessment].Ankara: Pegem Akademi Yayıncılık
- Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563-575. https://doi.org/https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
- Linacre, J.M. (2014). A user’s guide to FACETS: Rasch-model computer programs (Vol. 18). http://www.winsteps.com/manuals.htm
- Linacre, J.M. (2023). Facets computer program for many-facet Rasch measurement. Winsteps.com.
- Loignon, A.C., Woehr, D.J., Thomas, J.S., Loughry, M.L., Ohland, M.W., & Ferguson, D.M. (2017). Facilitating peer evaluation in team contexts: The impact of frame-of-reference rater training. Academy of Management Learning & Education, 16(4), 562-578. https://doi.org/10.5465/amle.2016.0163
- Lumley, T., & McNamara, T.F. (1995). Rater characteristics and rater bias: implications for training. Language Testing, 12(1), 54-71. https://doi.org/10.1177/026553229501200104
- Lunt, H., Morton, J., & Wigglesworth, G. (1994). Rater behaviour in performance testing: Evaluating the effect of bias feedback. 19th annual congress of Applied Linguistics Association of Australia: University of Melbourne. July,
- Martin, C.C., & Locke, K.D. (2022). What Do Peer Evaluations Represent? A Study of Rater Consensus and Target Personality [Brief Research Report]. Frontiers in Education, 7. https://doi.org/10.3389/feduc.2022.746457
- May, G.L. (2008). The Effect of Rater Training on Reducing Social Style Bias in Peer Evaluation. Business Communication Quarterly, 71(3), 297 313. https://doi.org/10.1177/1080569908321431
- Myford, C.M., & Wolfe, E.W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of applied measurement, 4(4), 386-422.
- Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of applied measurement, 5(2), 189-227.
- O’Sullivan, B., & Rignall, M. (2007). Assessing the value of bias analysis feedback to raters for the IELTS writing module. IELTS Collected Papers: Research in speaking and writing assessment, 446-478.
- Oosterhof, A. (1999). Developing and using classroom assessments. ERIC.
- Patri, M. (2002). The influence of peer feedback on self- and peer-assessment of oral skills. Language Testing, 19(2), 109-131. https://doi.org/10.1191/0265532202lt224oa
- Saito, H. (2008). EFL classroom peer assessment: Training effects on rating and commenting. Language Testing, 25(4), 553-581. https://doi.org/10.1177/0265532208094276
- Somervell, H. (1993). Issues in Assessment, Enterprise and Higher Education: the case for self‐peer and collaborative assessment. Assessment & Evaluation in Higher Education, 18(3), 221-233. https://doi.org/10.1080/0260293930180306
- Stiggins, R., & Chappuis, J. (2005). Using Student-Involved Classroom Assessment to Close Achievement Gaps. Theory Into Practice, 44(1), 11 18. https://doi.org/10.1207/s15430421tip4401_3
- Şata, M., Karakaya, İ., & Erman Aslanoğlu, A. (2020). Evaluation of University Students’ Rating Behaviors in Self and Peer Rating Process via Many Facet Rasch Model [Üniversite Öğrencilerinin Öz ve Akran Puanlama Sürecinde Puanlama Davranışlarının Many Facet Rasch Modeli ile İncelenmesi]. Eurasian Journal of Educational Research, 20(89), 25-46. https://dergipark.org.tr/en/pub/ejer/issue/57497/815802
- Turgut, M.F., & Baykul, Y. (2010). Eğitimde ölçme ve değerlendirme [Measurement and evaluation in education] (Vol. 2). Ankara: Pegem Akademi Yayıncılık
- Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305 319. https://doi.org/10.1177/026553229301000306
- Woolfolk, A.E., Hoy, A.W., Hughes, M., & Walkup, V. (2008). Psychology in education. Pearson Education.
- Yaşar, M. (2017). Ölçme ve değerlendirmenin önemi [The importance of measurement and evaluation]. Pegem Citation Index, 2-8.
- Yeşilçınar, S., & Şata, M. (2021). Examining Rater Biases of Peer Assessors in Different Assessment Environments. International Journal of Psychology and Educational Studies, 8(4), 136-151. https://dergipark.org.tr/en/pub/pes/issue/65718/1020683