Abu Kassim, N.L. (2007). Exploring rater judging behaviour using the many-facet Rasch model. Paper Presented in the Second Biennial International Conference on Teaching and Learning of English in Asia: Exploring New Frontiers (TELiA2), Universiti Utara, Malaysia. http://repo.uum.edu.my/3212/
Anastasi, A. (1976). Psychological testing (4th ed.). Macmillan.
Akpınar, M. (2019). The effect of peer assessment on pre-service teachers' teaching Practices. Education & Science, 44(200), 269-290. https://doi.org/10.15390/EB.2019.8077
Baird, J. A., Hayes, M., Johnson, R., Johnson, S., & Lamprianou, I. (2013). Marker effects and examination reliability. A Comparative exploration from the perspectives of generalisability theory, Rash model and multilevel modelling. University of Oxford for Educational Assessment. Retrieved from https://dera.ioe.ac.uk/17683/1/2013-01-21-marker-effects-and-examination-reliability.pdf
Barkaoui, K. (2013). Multifaceted Rasch analysis for test evaluation. The companion to language assessment, 3, 1301-1322. https://doi.org/10.1002/9781118411360.wbcla070
Bennett, J. (1998). Human resources management. Prentice Hall.
Bond, T., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge. https://doi.org/10.4324/9781315814698
Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89-110. https://doi.org/10.1191/0265532203lt245oa
Boud, D., & Soler, R. (2016). Sustainable assessment revisited. Assessment & Evaluation in Higher Education, 41(3), 400-413. https://doi.org/10.1080/02602938.2015.1018133
Brown, J. D., & Hudson, T. (1998). The alternatives in language assessment. TESOL quarterly, 32(4), 653-675. https://doi.org/10.2307/3587999
Cetin, B., & Ilhan, M. (2017). An analysis of rater severity and leniency in open-ended mathematic questions rated through standard rubrics and rubrics based on the SOLO taxonomy. Education and Science, 42(189), 217-247. https://doi.org/10.15390/EB.2017.5082
Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289. https://doi.org/10.2307/1165285
Cheng, K. H., & Tsai, C. C. (2012). Students' interpersonal perspectives on, conceptions of and approaches to learning in online peer assessment. Australasian Journal of Educational Technology, 28(4), 599-618. https://doi.org/10.14742/ajet.830
Chester, A., & Gwynne, G. (2006). Online teaching: encouraging collaboration through anonymity. Journal of Computer-Mediated Communication, 4(2), JCMC424. https://doi.org/10.1111/j.1083-6101.1998.tb00096.x
Cho, K., & MacArthur, C. (2010). Student revision with peer and expert reviewing. Learning and instruction, 20(4), 328-338. https://doi.org/10.1016/j.learninstruc.2009.08.006
Cronbach, L.I. (1990). Essentials of psychological testing. Harper and Row Publishers.
Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112. https://doi.org/10.1111/j.1745-3984.1994.tb00436.x
Esfandiari, R. (2015). Rater errors among peer-assessors: applying the many-facet Rasch measurement model. Iranian Journal of Applied Linguistics, 18(2), 77-107. https://doi.org/10.18869/acadpub.ijal.18.2.77
Farrokhi, F., & Esfandiari, R. (2011). A many-facet Rasch model to detect halo effect in three types of raters. Theory & Practice in Language Studies, 1(11), 1531-1540. https://doi.org/10.4304/tpls.1.11.1531-1540
Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83.
Freeman, M., & McKenzie, J. (2000). Self and Peer Assessment of Student Teamwork: Designing, implementing and evaluating SPARK, a confidential, web based system. Flexible learning for a flexible society. Retrieved from https://ascilite.org/archived-journals/aset/confs/aset-herdsa2000/procs/freeman.html
Goodrich, H. (1997). Understanding Rubrics: The dictionary may define" rubric," but these models provide more clarity. Educational Leadership, 54(4), 14-17.
Güneş, P., & Kiliç, D. (2016). Dereceli puanlama anahtarı ile öz, akran ve öğretmen değerlendirmesi. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi, 1(39), 58-69. https://doi.org/10.21764/efd.93792
Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
Haladyna, T. M. (1997). Writing test items in order to evaluate higher order thinking. Allyn & Bacon.
Hansen, K. (2003). Writing in the social sciences: A rhetoric with readings. Pearson Custom.
Hosack, I. (2004). The effects of anonymous feedback on Japanese university students’ attitudes towards peer review. In R. Hogaku (Ed.), Language and its universe (pp. 297–322). Ritsumeikan Hogaku.
Ilhan, M. (2016). A Comparison of the Ability Estimations of Classical Test Theory and the Many Facet Rasch Model in Measurements with Open-ended Questions. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 31(2), 346-368.
Ivankova, N. V., Creswell, J. W., & Stick, S. L. (2006). Using mixed-methods sequential explanatory design: From theory to practice. Field methods, 18(1), 3-20. https://doi.org/10.1177/1525822X05282260
Kane, J., Bernardin, H., Villanueva, J., & Peyrefitte, J. (1995). Stability of rater leniency: Three studies. Academy of Management Journal, 38, 1036-1051. https://doi.org/10.2307/256619
Khaatri, N., Kane, M.B., & Reeve, A.L. (1995). How performance assessments affect teaching and learning. Educational Leadership, 53(3), 80-83.
Kim, Y., Park, I., & Kang, M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted Physical Activity Quarterly, 29(4), 346-365. https://doi.org/10.1123/apaq.29.4.346
Kingsbury, F. A. (1922). Analyzing ratings and training raters. Journal of Personnel Research, 1, 377–383.
Knoch, U., Fairbairn, J., Myford, C., & Huisman, A. (2018). Evaluating the relative effectiveness of online and face-to-face training for new writing raters. Papers in Language Testing and Assessment, 7(1), 61-86.
Kutlu, Ö., Doğan, C.D., & Karaya, İ. (2014). Öğrenci başarısının belirlenmesi: Performansa ve portfolyoya dayalı durum belirleme [Determining student success: Determination based on performance and portfolio]. Pegem Akademi Yayıncılık.
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel psychology, 28(4), 563-575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lawshe, C. H. (1985). Inferences from personnel tests and their validity. Journal of Applied Psychology, 70(1), 237-238. https://doi.org/10.1037/0021-9010.70.1.237
Li, L. (2017). The role of anonymity in peer assessment. Assessment & Evaluation in Higher Education, 42(4), 645-656. https://doi.org/10.1080/02602938.2016.1174766
Li, L., & Gao, F. (2016). The effect of peer assessment on project performance of students at different learning levels. Assessment & Evaluation in Higher Education, 41(6), 885-900. https://doi.org/10.1080/02602938.2015.1048185
Li, L., Liu, X., & Zhou, Y. (2012). Give and take: A re‐analysis of assessor and assessee's roles in technology‐facilitated peer assessment. British Journal of Educational Technology, 43(3), 376-384. https://doi.org/10.1111/j.1467-8535.2011.01180.x
Linacre, J. M. (2017). A user’s guide to FACETS: Rasch-model computer programs. MESA Press.
Mackay, A., & Gass, S. (2005). Second Language Research: Methodology and Design. Lawrence Erlbaum Associates.
May, G. L. (2008). The effect of rater training on reducing social style bias in peer evaluation. Business Communication Quarterly, 71(3), 297-313. https://doi.org/10.1177/1080569908321431
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241-256. https://doi.org/10.1177/026553229601300302
Miyazoe, T., & Anderson, T. (2011). Anonymity in blended learning: who would you like to be?. Journal of Educational Technology & Society, 14(2), 175–187.
Moore, B.B. (2009). Consideration of rater effects and rater design via signal detection theory. (Unpublished Doctoral dissertation). Retrieved from http://www.proquest.com/
Moskal, B. M. (2000). Scoring rubrics: What, when and how? Practical Assessment, Research, and Evaluation, 7(1), 3.
Myford, C. M. (2002). Investigating design features of descriptive graphic rating scales. Applied Measurement in Education, 15(2), 187–215. https://doi.org/10.1207/S15324818AME1502_04
Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many- facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Newby, D., Allan, R., Fenner, A. B., Jones, B., Komorowska, H., & Soghikyan, K. (2007). European Portfolio for Student Teachers of Languages: A reflection tool for language teacher education. Council of Europe.
Özdemir, O., & Erdem, D. (2017). Sunum becerilerinin akran değerlendirmesine arkadaşlığın etkisi. Turkish Journal of Educational Studies, 4(1), 21-43.
Panadero, E. (2016). Is it safe? Social, interpersonal, and human effects of peer assessment: a review and future directions. In G. T. L. Brown & L. R. Harris (Eds.), Human factors and social conditions of assessment (pp. 1–39). Routledge.
Panadero, E., Romero, M., & Strijbos, J-W (2013). The impact of a rubric and friendship on peer assessment: Effects on construct validity, performance, and perceptions of fairness and comfort. Studies in Educational Evaluation, 39(4), 195–203. https://doi.org/10.1016/j.stueduc.2013.10.005
Papinczak, T., Young, L., Groves, M., & Haynes, M. (2007). An analysis of peer, self, and tutor assessment in problem-based learning tutorials. Medical teacher, 29(5), e122-e132. https://doi.org/10.1080/01421590701294323
Pope, N. K. L. (2005). The impact of stress in self- and peer assessment. Assessment & Evaluation in Higher Education, 30(1), 51-63. https://doi.org/10.1080/0260293042003243896
Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94(1), 31-37. https://doi.org/10.5951/MT.94.1.0031
Rotsaert, T., Panadero, E., & Schellens, T. (2018). Anonymity as an instructional scaffold in peer assessment: its effects on peer feedback quality and evolution in students’ perceptions about peer assessment skills. European Journal of Psychology of Education, 33(1), 75-99. https://doi.org/10.1007/s10212-017-0339-8
Royal, K. D., & Hecker, K. G. (2016). Rater errors in clinical performance assessments. Journal of veterinary medical education, 43(1), 5-8. https://doi.org/10.3138/jvme.0715-112R
Schools, C. C. P., & Chesterfıeld, V. (2015). Performance evaluation handbook for teachers. Regina, SK. https://www.nctq.org/dmsView/70-07
Schoonenboom, J., & Johnson, R. B. (2017). How to construct a mixed methods research design. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie, 69(2), 107-131. https://doi.org/10.1007/s11577-017-0454-1
Sudweeks, R. R., Reeve, S. & Bradshaw, W.S. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9, 239-261. https://doi.org/10.1016/j.asw.2004.11.001
Sung, Y. T., Chang, K. E., Chang, T. H., & Yu, W. C. (2010). How many heads are better than one? The reliability and validity of teenagers' self-and peer assessments. Journal of Adolescence, 33(1), 135-145. https://doi.org/10.1016/j.adolescence.2009.04.004
Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3–22. https://doi.org/10.1177/0265532215594830
Vanderhoven, E., Raes, A., Montrieux, H., Rotsaert, T., & Schellens, T. (2015). What if pupils can assess their peers anonymously? A quasi-experimental study. Computers & Education, 81, 123–132. https://doi.org/10.1016/j.compedu.2014.10.001
Vickerman, P. (2009). Student perspectives on formative peer assessment: an attempt to deepen learning?. Assessment & Evaluation in Higher Education, 34(2), 221-230. https://doi.org/10.1080/02602930801955986
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205
Welsh, E. (2002, May). Dealing with data: Using NVivo in the qualitative data analysis process. In Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 3(2). Retrieve from http://www.qualitative-research.net/index.php/fqs/article/view/865/1881
Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197-210. https://doi.org/10.1177/0748175612440286
Yu, F. Y., & Liu, Y. H. (2009). Creating a psychologically safe online space for a student‐generated questions learning activity via different identity revelation modes. British Journal of Educational Technology, 40(6), 1109-1123. https://doi.org/10.1111/j.1467-8535.2008.00905.x
Yu, F. Y., & Sung, S. (2016). A mixed methods approach to the assessor's targeting behavior during online peer assessment: effects of anonymity and underlying reasons. Interactive learning environments, 24(7), 1674-1691. https://doi.org/10.1080/10494820.2015.1041405
Examining Rater Biases of Peer Assessors in Different Assessment Environments
Year 2021,
Volume: 8 Issue: 4, 136 - 151, 31.10.2021
The current study employed many-facet Rasch measurement (MFRM) to explain the rater bias patterns of EFL student teachers (hereafter students) when they rate the teaching performance of their peers in three assessment environments: online, face-to-face, and anonymous. Twenty-four students and two instructors rated 72 micro-teachings performed by senior Turkish students. The performance was assessed using a five-category analytic rubric developed by the researchers (Lesson Presentation, Classroom Management, Communication, Material, and Instructional Feedback). MFRM revealed the severity and leniency biases in all three assessment environments at the group and individual levels, drawing attention to the less occurrence of biases anonymous assessment. The central tendency and halo effects were observed only at the individual level in all three assessment environments, and these errors were similar to each other. Semi-structured interviews with peer raters (n = 24) documented their perspectives about how the anonymous assessment affected the severity, leniency, central tendency, and halo effects. Besides, the findings displayed that hiding the identity of the peers develops the reliability and validity of the measurements performed during peer assessment.
Abu Kassim, N.L. (2007). Exploring rater judging behaviour using the many-facet Rasch model. Paper Presented in the Second Biennial International Conference on Teaching and Learning of English in Asia: Exploring New Frontiers (TELiA2), Universiti Utara, Malaysia. http://repo.uum.edu.my/3212/
Anastasi, A. (1976). Psychological testing (4th ed.). Macmillan.
Akpınar, M. (2019). The effect of peer assessment on pre-service teachers' teaching Practices. Education & Science, 44(200), 269-290. https://doi.org/10.15390/EB.2019.8077
Baird, J. A., Hayes, M., Johnson, R., Johnson, S., & Lamprianou, I. (2013). Marker effects and examination reliability. A Comparative exploration from the perspectives of generalisability theory, Rash model and multilevel modelling. University of Oxford for Educational Assessment. Retrieved from https://dera.ioe.ac.uk/17683/1/2013-01-21-marker-effects-and-examination-reliability.pdf
Barkaoui, K. (2013). Multifaceted Rasch analysis for test evaluation. The companion to language assessment, 3, 1301-1322. https://doi.org/10.1002/9781118411360.wbcla070
Bennett, J. (1998). Human resources management. Prentice Hall.
Bond, T., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge. https://doi.org/10.4324/9781315814698
Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89-110. https://doi.org/10.1191/0265532203lt245oa
Boud, D., & Soler, R. (2016). Sustainable assessment revisited. Assessment & Evaluation in Higher Education, 41(3), 400-413. https://doi.org/10.1080/02602938.2015.1018133
Brown, J. D., & Hudson, T. (1998). The alternatives in language assessment. TESOL quarterly, 32(4), 653-675. https://doi.org/10.2307/3587999
Cetin, B., & Ilhan, M. (2017). An analysis of rater severity and leniency in open-ended mathematic questions rated through standard rubrics and rubrics based on the SOLO taxonomy. Education and Science, 42(189), 217-247. https://doi.org/10.15390/EB.2017.5082
Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289. https://doi.org/10.2307/1165285
Cheng, K. H., & Tsai, C. C. (2012). Students' interpersonal perspectives on, conceptions of and approaches to learning in online peer assessment. Australasian Journal of Educational Technology, 28(4), 599-618. https://doi.org/10.14742/ajet.830
Chester, A., & Gwynne, G. (2006). Online teaching: encouraging collaboration through anonymity. Journal of Computer-Mediated Communication, 4(2), JCMC424. https://doi.org/10.1111/j.1083-6101.1998.tb00096.x
Cho, K., & MacArthur, C. (2010). Student revision with peer and expert reviewing. Learning and instruction, 20(4), 328-338. https://doi.org/10.1016/j.learninstruc.2009.08.006
Cronbach, L.I. (1990). Essentials of psychological testing. Harper and Row Publishers.
Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112. https://doi.org/10.1111/j.1745-3984.1994.tb00436.x
Esfandiari, R. (2015). Rater errors among peer-assessors: applying the many-facet Rasch measurement model. Iranian Journal of Applied Linguistics, 18(2), 77-107. https://doi.org/10.18869/acadpub.ijal.18.2.77
Farrokhi, F., & Esfandiari, R. (2011). A many-facet Rasch model to detect halo effect in three types of raters. Theory & Practice in Language Studies, 1(11), 1531-1540. https://doi.org/10.4304/tpls.1.11.1531-1540
Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83.
Freeman, M., & McKenzie, J. (2000). Self and Peer Assessment of Student Teamwork: Designing, implementing and evaluating SPARK, a confidential, web based system. Flexible learning for a flexible society. Retrieved from https://ascilite.org/archived-journals/aset/confs/aset-herdsa2000/procs/freeman.html
Goodrich, H. (1997). Understanding Rubrics: The dictionary may define" rubric," but these models provide more clarity. Educational Leadership, 54(4), 14-17.
Güneş, P., & Kiliç, D. (2016). Dereceli puanlama anahtarı ile öz, akran ve öğretmen değerlendirmesi. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi, 1(39), 58-69. https://doi.org/10.21764/efd.93792
Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
Haladyna, T. M. (1997). Writing test items in order to evaluate higher order thinking. Allyn & Bacon.
Hansen, K. (2003). Writing in the social sciences: A rhetoric with readings. Pearson Custom.
Hosack, I. (2004). The effects of anonymous feedback on Japanese university students’ attitudes towards peer review. In R. Hogaku (Ed.), Language and its universe (pp. 297–322). Ritsumeikan Hogaku.
Ilhan, M. (2016). A Comparison of the Ability Estimations of Classical Test Theory and the Many Facet Rasch Model in Measurements with Open-ended Questions. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 31(2), 346-368.
Ivankova, N. V., Creswell, J. W., & Stick, S. L. (2006). Using mixed-methods sequential explanatory design: From theory to practice. Field methods, 18(1), 3-20. https://doi.org/10.1177/1525822X05282260
Kane, J., Bernardin, H., Villanueva, J., & Peyrefitte, J. (1995). Stability of rater leniency: Three studies. Academy of Management Journal, 38, 1036-1051. https://doi.org/10.2307/256619
Khaatri, N., Kane, M.B., & Reeve, A.L. (1995). How performance assessments affect teaching and learning. Educational Leadership, 53(3), 80-83.
Kim, Y., Park, I., & Kang, M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted Physical Activity Quarterly, 29(4), 346-365. https://doi.org/10.1123/apaq.29.4.346
Kingsbury, F. A. (1922). Analyzing ratings and training raters. Journal of Personnel Research, 1, 377–383.
Knoch, U., Fairbairn, J., Myford, C., & Huisman, A. (2018). Evaluating the relative effectiveness of online and face-to-face training for new writing raters. Papers in Language Testing and Assessment, 7(1), 61-86.
Kutlu, Ö., Doğan, C.D., & Karaya, İ. (2014). Öğrenci başarısının belirlenmesi: Performansa ve portfolyoya dayalı durum belirleme [Determining student success: Determination based on performance and portfolio]. Pegem Akademi Yayıncılık.
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel psychology, 28(4), 563-575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lawshe, C. H. (1985). Inferences from personnel tests and their validity. Journal of Applied Psychology, 70(1), 237-238. https://doi.org/10.1037/0021-9010.70.1.237
Li, L. (2017). The role of anonymity in peer assessment. Assessment & Evaluation in Higher Education, 42(4), 645-656. https://doi.org/10.1080/02602938.2016.1174766
Li, L., & Gao, F. (2016). The effect of peer assessment on project performance of students at different learning levels. Assessment & Evaluation in Higher Education, 41(6), 885-900. https://doi.org/10.1080/02602938.2015.1048185
Li, L., Liu, X., & Zhou, Y. (2012). Give and take: A re‐analysis of assessor and assessee's roles in technology‐facilitated peer assessment. British Journal of Educational Technology, 43(3), 376-384. https://doi.org/10.1111/j.1467-8535.2011.01180.x
Linacre, J. M. (2017). A user’s guide to FACETS: Rasch-model computer programs. MESA Press.
Mackay, A., & Gass, S. (2005). Second Language Research: Methodology and Design. Lawrence Erlbaum Associates.
May, G. L. (2008). The effect of rater training on reducing social style bias in peer evaluation. Business Communication Quarterly, 71(3), 297-313. https://doi.org/10.1177/1080569908321431
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241-256. https://doi.org/10.1177/026553229601300302
Miyazoe, T., & Anderson, T. (2011). Anonymity in blended learning: who would you like to be?. Journal of Educational Technology & Society, 14(2), 175–187.
Moore, B.B. (2009). Consideration of rater effects and rater design via signal detection theory. (Unpublished Doctoral dissertation). Retrieved from http://www.proquest.com/
Moskal, B. M. (2000). Scoring rubrics: What, when and how? Practical Assessment, Research, and Evaluation, 7(1), 3.
Myford, C. M. (2002). Investigating design features of descriptive graphic rating scales. Applied Measurement in Education, 15(2), 187–215. https://doi.org/10.1207/S15324818AME1502_04
Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many- facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Newby, D., Allan, R., Fenner, A. B., Jones, B., Komorowska, H., & Soghikyan, K. (2007). European Portfolio for Student Teachers of Languages: A reflection tool for language teacher education. Council of Europe.
Özdemir, O., & Erdem, D. (2017). Sunum becerilerinin akran değerlendirmesine arkadaşlığın etkisi. Turkish Journal of Educational Studies, 4(1), 21-43.
Panadero, E. (2016). Is it safe? Social, interpersonal, and human effects of peer assessment: a review and future directions. In G. T. L. Brown & L. R. Harris (Eds.), Human factors and social conditions of assessment (pp. 1–39). Routledge.
Panadero, E., Romero, M., & Strijbos, J-W (2013). The impact of a rubric and friendship on peer assessment: Effects on construct validity, performance, and perceptions of fairness and comfort. Studies in Educational Evaluation, 39(4), 195–203. https://doi.org/10.1016/j.stueduc.2013.10.005
Papinczak, T., Young, L., Groves, M., & Haynes, M. (2007). An analysis of peer, self, and tutor assessment in problem-based learning tutorials. Medical teacher, 29(5), e122-e132. https://doi.org/10.1080/01421590701294323
Pope, N. K. L. (2005). The impact of stress in self- and peer assessment. Assessment & Evaluation in Higher Education, 30(1), 51-63. https://doi.org/10.1080/0260293042003243896
Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94(1), 31-37. https://doi.org/10.5951/MT.94.1.0031
Rotsaert, T., Panadero, E., & Schellens, T. (2018). Anonymity as an instructional scaffold in peer assessment: its effects on peer feedback quality and evolution in students’ perceptions about peer assessment skills. European Journal of Psychology of Education, 33(1), 75-99. https://doi.org/10.1007/s10212-017-0339-8
Royal, K. D., & Hecker, K. G. (2016). Rater errors in clinical performance assessments. Journal of veterinary medical education, 43(1), 5-8. https://doi.org/10.3138/jvme.0715-112R
Schools, C. C. P., & Chesterfıeld, V. (2015). Performance evaluation handbook for teachers. Regina, SK. https://www.nctq.org/dmsView/70-07
Schoonenboom, J., & Johnson, R. B. (2017). How to construct a mixed methods research design. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie, 69(2), 107-131. https://doi.org/10.1007/s11577-017-0454-1
Sudweeks, R. R., Reeve, S. & Bradshaw, W.S. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9, 239-261. https://doi.org/10.1016/j.asw.2004.11.001
Sung, Y. T., Chang, K. E., Chang, T. H., & Yu, W. C. (2010). How many heads are better than one? The reliability and validity of teenagers' self-and peer assessments. Journal of Adolescence, 33(1), 135-145. https://doi.org/10.1016/j.adolescence.2009.04.004
Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3–22. https://doi.org/10.1177/0265532215594830
Vanderhoven, E., Raes, A., Montrieux, H., Rotsaert, T., & Schellens, T. (2015). What if pupils can assess their peers anonymously? A quasi-experimental study. Computers & Education, 81, 123–132. https://doi.org/10.1016/j.compedu.2014.10.001
Vickerman, P. (2009). Student perspectives on formative peer assessment: an attempt to deepen learning?. Assessment & Evaluation in Higher Education, 34(2), 221-230. https://doi.org/10.1080/02602930801955986
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205
Welsh, E. (2002, May). Dealing with data: Using NVivo in the qualitative data analysis process. In Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 3(2). Retrieve from http://www.qualitative-research.net/index.php/fqs/article/view/865/1881
Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197-210. https://doi.org/10.1177/0748175612440286
Yu, F. Y., & Liu, Y. H. (2009). Creating a psychologically safe online space for a student‐generated questions learning activity via different identity revelation modes. British Journal of Educational Technology, 40(6), 1109-1123. https://doi.org/10.1111/j.1467-8535.2008.00905.x
Yu, F. Y., & Sung, S. (2016). A mixed methods approach to the assessor's targeting behavior during online peer assessment: effects of anonymity and underlying reasons. Interactive learning environments, 24(7), 1674-1691. https://doi.org/10.1080/10494820.2015.1041405
Yeşilçınar, S., & Şata, M. (2021). Examining Rater Biases of Peer Assessors in Different Assessment Environments. International Journal of Psychology and Educational Studies, 8(4), 136-151.