Year 2023,
Volume: 10 Issue: 5, 98 - 118, 01.09.2023
Hikmet Şevgin
,
Mehmet Şata
References
- Alaz, A., & Yarar, S. (2009, May). Classroom teachers' preferences and reasons in the measurement and evaluation process. I. International Education Research Congress. Canakkale Onsekiz Mart University, Canakkale.
- Alici, D. (2010). Other measurement tools and methods used in evaluating student performance. In Tekindal S. (Ed.), Measurement and evaluation in education (pp. 127-168). Pegem Akademi Publishing.
- Ananiadou, K., & Claro, M. (2009). 21St century skills and competences for new millennium learners in OECD countries. OECD education working papers, 41, OECD Publishing.
- Arik, R. S., & Kutlu, O. (2013). Scaling the competency of teachers' measurement and evaluation field based on judge decisions. Journal of educational sciences research, 3(2), 163-196. https://doi.org/10.12973/jesr.2013.3210a
- Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, policy & practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102
- Board of Education (2005). Introduction booklet of primary school grades 1-5 curriculum. Ministry of National Education.
- Borkan, B. (2017). Rater severity drift in peer assessment. Journal of Measurement and evaluation in education and psychology, 8(4), 469-489. https://doi.org/10.21031/epod.328119
- Boud, D. (2013). Enhancing learning through self-assessment. Routledge. https://doi.org/10.4324/9781315041520
- Case, H. (1997). An examination of variation in rater severity over time: A study in rater drift. Objective measurement: Theory into practice, 5, 1-38.
- Cepni, S. (2010). Introduction to research and project work. Celepler Publishing.
- Colvin, S., & Vos, E. K. (1997). Authentic assessment models for statistics education. The assessment challenge in statistics education, 27-36.
- Congdon, P. J., & MeQueen, J. (2000). The stability of rater severity in large‐scale assessment programs. Journal of educational measurement, 37(2), 163-178. https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
- Dikli, S. (2003). Assessment at a distance: Traditional vs. alternative assessments. Turkish online journal of educational technology-TOJET, 2(3), 13-19.
- Dishon, G., & Gilead, T. (2021). Adaptability and its discontents: 21st-century skills and the preparation for an unpredictable future. British journal of educational studies, 69(4), 393-413. https://doi.org/10.1080/00071005.2020.1829545
- Dogan, C. D., & Uluman, M. (2017). A Comparison of rubrics and graded category rating scales with various methods regarding raters' reliability. Educational sciences: Theory and practice, 17(2), 631-651. https://doi.org/10.12738/estp.2017.2.0321
- Donnon, T., McIlwrick, J., & Woloschuk, W. (2013). Investigating the reliability and validity of self and peer assessment to measure medical students’ professional competencies. Creative education, 4(6A), 23-28. https://doi.org/10.4236/ce.2013.46A005
- Duban, N., & Kucukyilmaz, E. A. (2008). Classroom teacher candidates' views on the use of alternative assessment techniques in application schools. Elementary education online, 7(3), 769-784.
- Dunn, K. E., & Mulvenon, S. W. (2009). A critical review of research on formative assessments: The limited scientific evidence of the impact of formative assessments in education. Practical assessment, research, and evaluation, 14(1), 1-11. https://doi.org/10.7275/jg4h-rb87
- Engelhard Jr, G., & Myford, C. M. (2003). Monitoring faculty consultant performance in the advanced placement English literature and composition program with a many‐faceted Rasch model. ETS Research Report Series, 2003(1), i-60. https://doi.org/10.1002/j.2333-8504.2003.tb01893.x
- Erman-Aslanoglu, A. & Sata, M. (2023). Examining the rater drift in the assessment of presentation skills in secondary school context. Journal of measurement and evaluation in education and psychology, 14(1), 62-75. https://doi.org/10.21031/epod.1213969
- Erman-Aslanoglu, A. (2017). Evaluation of an individual within a group: Peer and self-assessment. Bogazici university journal of education, 34(2), 35-50.
- Erman-Aslanoglu, A. (2022). Examining the effects of peer and self-assessment practices on writing skills. International journal of assessment tools in education, 9(Special Issue), 179-196. https://doi.org/10.21449/ijate.1127815
- Falchikov, N. (1995). Peer feedback marking: Developing peer assessment. Innovations in Education and training International, 32(2), 175-187. https://doi.org/10.1080/1355800950320212
- Farrokhi, F., Esfandiari, R., & Dalili, M. V. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World applied sciences journal, 15(11), 70-77.
- Farrokhi. F., Esfandiari. R., & Schaefer. E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT journal, 34(1). 79-101. https://doi.org/10.37546/JALTJJ34.1-3
- Gelbal, S., & Kelecioglu, H. (2007). Teacher competency perceptions and problems encountered in measurement and evaluation methods. Hacettepe university journal of education, (33), 135-145.
- Gocer, A., Arslan, S., & Cayli, C. (2017). Process-oriented complementary assessment tools and methods for determining student development in Turkish education. Suleyman Demirel university journal of social sciences institute, (28), 263-292.
- Gomleksiz, M. N., Yetkiner, A., & Yildirim, F. (2011). Teachers’ views on the use of alternative assessment and evaluation techniques in life studies class. Education sciences, 6(1), 823-840.
- Guler, N. (2012). Measurement and assessment in education. Pegem Akademi Publishing. https://doi.org/10.14527/9786053641247
- Hafner, J., & Hafner, P. (2003). Quantitative analysis of the rubric as an assessment tool: an empirical study of student peer‐group rating. Int. J. Sci. Educ., 25(12), 1509-1528. https://doi.org/10.1080/0950069022000038268
- Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge. https://doi.org/10.4324/9780203850381
- Hamayan, E. V. (1995). Approaches to alternative assessment. Annual review of applied linguistics, 15, 212-226. https://doi.org/10.1017/S0267190500002695
- Hambleton. R. K., Swaminathan. H., & Rogers. H. J. (1991). Fundamentals of item response theory. SAGE Publications.
- Harik, P., Clauser, B. E., Grabovsky, I., Nungester, R. J., Swanson, D., & Nandakumar, R. (2009). An examination of rater drift within a generalizability theory framework. Journal of educational measurement, 46(1), 43-58. https://doi.org/10.1111/j.1745-3984.2009.01068.x
- Hoskens, M., & Wilson, M. (2001). Real‐time feedback on rater drift in constructed‐response items: An example from the golden sate examination. Journal of educational measurement, 38(2), 121-145. https://doi.org/10.1111/j.1745-3984.2001.tb01119.x
- Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it?. Psychological methods, 5(1), 64–86. https://doi.org/10.1037/1082-989X.5.1.64
- Karakaya, I. (2015). Comparison of self, peer and instructor assessments in the portfolio assessment by using many facet Rasch model. Journal of education and human development, 4(2), 182-192. https://doi.org/10.15640/jehd.v4n2a22
- Kassim. A.N.L. (2007, June). Exploring rater judging behaviour using the many-facet Rasch model. Paper presented in the second biennial international conference on teaching and learning of english in asia: Exploring new frontiers (TELiA2). Universiti Utara. Malaysia.
- Kilic, D., & Gunes, P. (2016). Self, peer, and teacher assessment with grading rubrics. Mehmet Akif Ersoy university journal of education faculty, 1(39), 58-69. https://doi.org/10.21764/efd.93792
- Kim. Y., Park. I., & Kang. M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted physical activity quarterly, 29(4). 346-365. https://doi.org/10.1123/apaq.29.4.346
- Kooken, J., Welsh, M. E., McCoach, D. B., Miller, F. G., Chafouleas, S. M., Riley-Tillman, T. C., & Fabiano, G. (2017). Test order in teacher-rated behavior assessments: Is counterbalancing necessary?. Psychological assessment, 29(1), 98-109. https://doi.org/10.1037/pas0000314
- Kosterelioglu, İ., & Celen, Ü. (2016). Evaluation of the effectiveness of self-assessment method. Ilkogretim online, 15(2), 671-681. https://doi.org/10.17051/io.2016.44304
- Koyuncu, M. S. & Sata, M. (2023). Using ACER ConQuest program to examine multidimensional and many-facet models. International journal of assessment tools in education, 10(2), 279-302. https://doi.org/10.21449/ijate.1238248
- Kutlu, O., Dogan, C.D., & Karakaya, I., (2010). Determination of student achievement: Performance-based and portfolio-based authentic assessment and evaluation practices. Pegem Akademi Publishing.
- Lamprianou, I. (2006). The stability of marker characteristics across tests of the same subject and across subjects. Journal of applied measurement, 7(2), 192-205.
- Leckie, G., & Baird, J. A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of educational measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
- Linacre. J. M. (1996). Generalizability theory and many-facet Rasch measurement. Objective measurement: Theory into practice, 3, 85-98.
- Linacre. J.M. (2017). A user's guide to FACETS: Rasch-model computer programs. MESA Press.
- Linn, R. L. (2008). Measurement and assessment in teaching. Pearson Education
- Maier, A., Adams, J., Burns, D., Kaul, M., Saunders, M., & Thompson, C. (2020). Using performance assessments to support student learning: how district ınitiatives can make a difference. performance assessment case study series. Learning policy institute. i-68. Palo Alto. https://doi.org/10.54300/213.365
- McLaughlin, K., Ainslie, M., Coderre, S., Wright, B., & Violato, C. (2009). The effect of differential rater function over time (DRIFT) on objective structured clinical examination ratings. Medical education, 43(10), 989-992. https://doi.org/10.1111/j.1365-2923.2009.03438.x
- McNamara, T. F., & Adams, R. J. (1991). Exploring rater characteristics with Rasch techniques. In Selected papers of the 13th Language Testing Research Colloquium (LTRC). Educational Testing Service, International Testing and Training Program Office.
- Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist, 50(9), 741-749. https://doi.org/10.1037/0003-066X.50.9.741
- Modarresi, G., Jalilzadeh, K., Coombe, C., & Nooshab, A. (2021). Validating a test to measure translation teachers' assessment literacy. Journal of Asia TEFL, 18(4), 1503-1511. https://doi.org/10.18823/asiatefl.2021.18.4.31.1503
- Mulqueen, C., Baker, D., & Dismukes, R. K. (2000, April). Using multifacet Rasch analysis to examine the effectiveness of rater training. In 15th Annual Conference for the Society for Industrial and Organizational Psychology. https://doi.org/10.1037/e540522012-001
- Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of educational measurement, 46(4), 371-389. https://doi.org/10.1111/j.1745-3984.2009.00088.x
- Nalbantoglu Yilmaz, F. (2017). Analysis of the rater effects on the scoring of diagnostic trees prepared by teacher candidates with the many-facet Rasch model. Online submission, 8(18), 174-184. https://doi.org/10.15345/iojes.2016.02.020
- National Research Council. (2001). Classroom assessment and the national science education standards. National Academies Press.
- Noonan, B., & Duncan, C. R. (2005). Peer and self-assessment in high schools. Practical assessment, research, and evaluation, 10(1), 1-8. https://doi.org/10.7275/a166-vm41
- Oren, F. S., Ormanci, U., & Evrekli, E. (2014). The alternative assessment-evaluation approaches preferred by pre-service teachers and their self-efficacy towards these approaches. Educational sciences: Theory & practice, 11(3), 1690-1698.
- Orlova, N. (2019). Student peer performance evaluation: importance of implementation for group work enhancement. Science and education a new dimension: Pedagogy and psychology, 26-29. https://doi.org/10.31174/SEND-PP2019-207VII84-05
- Ozpinar, I. (2021). Self, peer, group, and instructor assessment: A glimpse through the window of teacher competencies. Cumhuriyet international journal of education, 10(3), 949-973. https://doi.org/10.30703/cije.754885
- Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the literature. Practical assessment, research, and evaluation, 13(4), 1-11. https://doi.org/10.7275/0qpc-ws45
- Park, Y. S. (2011). Rater drift in constructed response scoring via latent class signal detection theory and item response theory. Columbia University. https://doi.org/10.7916/D8445TGR
- Petra, T. Z. H. T., & Ab Aziz, M. J. (2020, April). Investigating reliability and validity of student performance assessment in higher education using Rasch model. In Journal of Physics: Conference Series 1529(4), 042088. IOP Publishing. https://doi.org/10.1088/1742-6596/1529/4/042088
- Quellmalz, E. (1980). Problems in stabilizing the judgment process (Vol. CSE Report No. 136). Center for the Study of Evaluation.
- Raju. N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied psychological measurement, 14(2), 197-207. https://doi.org/10.1177/014662169001400208
- Raymond, M. R., Harik, P., & Clauser, B. E. (2011). The impact of statistically adjusting for rater effects on conditional standard errors of performance ratings. Applied Psychological measurement, 35(3), 235-246. https://doi.org/10.1177/0146621610390675
- Rennert-Ariev, P. (2005). A theoretical model for the authentic assessment of teaching. Practical assessment, research, and evaluation, 10(2), 1-12. https://doi.org/10.7275/a7h7-4111
- Sad, S. N., & Goktas, O. (2013). Examination of traditional and contemporary measurement and evaluation approaches of academic staff. Ege education journal, 14(2), 79-105.
- Sata, M. & Karakaya, I. (2022). Investigating the impact of rater training on rater errors in the process of assessing writing skill. International journal of assessment tools in education, 9(2), 492-514. https://doi.org/10.21449/ijate.877035
- Sata, M. (2020a). Quantitative research approaches. In E. Oğuz (Ed.), Research methods in education (pp. 77-98). Egiten Kitap Publications.
- Sata, M. (2020b, November). Evaluation of university students' oral presentation skills by their peers. 13th International Education Community Symposium. Online. Turkey.
- Shepard, L. A. (2000). The role of assessment in a learning culture. Educational researcher, 29(7), 4-14. https://doi.org/10.3102/0013189X029007004
- Swaminathan. H., & Rogers. H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
- Szőkol, I., Szarka, K., & Hargaš, J. (2022). The functions of educational evaluation. R&Amp;E-SOURCE, (S24). https://doi.org/10.53349/resource.2022.iS24.a1112
- Tunkler, V. (2019). Investigation of the contribution of peer assessment to pre-service teachers' professional knowledge and skills. Marmara university Atatürk education faculty journal of educational sciences, 50(50), 206-221. https://doi.org/10.15285/maruaebd.525171
- Uto, M. (2022). A Bayesian many-facet Rasch model with Markov modeling for rater severity drift. Behavior research methods, 1-19. https://doi.org/10.3758/s13428-022-01997-z
- Uzun, A., & Yurdabakan, I. (2011). An investigation of elementary school students' attitudes towards self-assessment. Mehmet Akif Ersoy university journal of education faculty, 11(22), 51-69.
- Wayda, V., & Lund, J. (2005). Assessing dispositions: An unresolved challenge in teacher education. Journal of physical education, recreation & dance, 76(1), 34-41. https://doi.org/10.1080/07303084.2005.10607317
- Wesolowski, B. C., Wind, S. A., & Engelhard Jr, G. (2017). Evaluating differential rater functioning over time in the context of solo music performance assessment. Bulletin of the council for research in music education, (212), 75-98. https://doi.org/10.5406/bulcouresmusedu.212.0075
- Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian review of applied linguistics, 17(2), 77-103. https://doi.org/10.1075/aral.17.2.04wig
- Wolfe, E. W., Moulder, B. C., & Myford, C. M. (1999, April). Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. Annual Meeting of the American Educational Research Association. Montreal, Quebec, Canada.
- Wolfe, E. W., Myford, C. M., Engelhard Jr, G., & Manalo, J. R. (2007). Monitoring reader performance and DRIFT in the AP® English literature and composition examination using benchmark essays. Research Report No. 2007-2. College Board.
- Yildiz, S. (2018). Developing a self-assessment scale for fractions. Mustafa Kemal university journal of faculty of education, 2(3), 30-44.
- Yurdabakan, I. (2012). The effect of peer and collaborative assessment training on pre-service teachers’ self-assessment skills. Education and science, 37(163), 190-202.
- Zhu, W., & Cole, E. L. (1996). Many-faceted Rasch calibration of a gross motor instrument. Research quarterly for exercise and sport, 67(1), 24-34. https://doi.org/10.1080/02701367.1996.10607922
The Role of Time on Performance Assessment (Self, Peer and Teacher) in Higher Education: Rater Drift
Year 2023,
Volume: 10 Issue: 5, 98 - 118, 01.09.2023
Hikmet Şevgin
,
Mehmet Şata
Abstract
This study aimed to investigate the change in teacher candidates' oral presentation skills over time through self, peer, and teacher assessments using the rater drift method. A longitudinal descriptive research model was used as a quantitative research approach to achieve this aim. The study group consisted of 47 teacher candidates receiving formation education at a state university in the Eastern Anatolia Region and an instructor teaching the course. An analytical rubric was used as a data collection tool to evaluate the candidates' oral presentation skills. The data collection process lasted six weeks in total. Since the performance evaluation process aimed to examine the change over time, the many-facet Rasch model was used. When the findings of the study were examined, it was determined that the rater behavior of teacher candidates had statistically significant differences at the group level over time. It was found that 26 out of 48 peer raters had rater drift in their evaluations. It was also found that the majority of rater drift over time was positive, meaning that evaluators became more generous over time. Another result obtained in the study was that teacher assessment did not show rater drift over time, with similar ratings for six weeks. The study’s findings were discussed with previous studies in the literature, and recommendations were made to researchers.
References
- Alaz, A., & Yarar, S. (2009, May). Classroom teachers' preferences and reasons in the measurement and evaluation process. I. International Education Research Congress. Canakkale Onsekiz Mart University, Canakkale.
- Alici, D. (2010). Other measurement tools and methods used in evaluating student performance. In Tekindal S. (Ed.), Measurement and evaluation in education (pp. 127-168). Pegem Akademi Publishing.
- Ananiadou, K., & Claro, M. (2009). 21St century skills and competences for new millennium learners in OECD countries. OECD education working papers, 41, OECD Publishing.
- Arik, R. S., & Kutlu, O. (2013). Scaling the competency of teachers' measurement and evaluation field based on judge decisions. Journal of educational sciences research, 3(2), 163-196. https://doi.org/10.12973/jesr.2013.3210a
- Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, policy & practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102
- Board of Education (2005). Introduction booklet of primary school grades 1-5 curriculum. Ministry of National Education.
- Borkan, B. (2017). Rater severity drift in peer assessment. Journal of Measurement and evaluation in education and psychology, 8(4), 469-489. https://doi.org/10.21031/epod.328119
- Boud, D. (2013). Enhancing learning through self-assessment. Routledge. https://doi.org/10.4324/9781315041520
- Case, H. (1997). An examination of variation in rater severity over time: A study in rater drift. Objective measurement: Theory into practice, 5, 1-38.
- Cepni, S. (2010). Introduction to research and project work. Celepler Publishing.
- Colvin, S., & Vos, E. K. (1997). Authentic assessment models for statistics education. The assessment challenge in statistics education, 27-36.
- Congdon, P. J., & MeQueen, J. (2000). The stability of rater severity in large‐scale assessment programs. Journal of educational measurement, 37(2), 163-178. https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
- Dikli, S. (2003). Assessment at a distance: Traditional vs. alternative assessments. Turkish online journal of educational technology-TOJET, 2(3), 13-19.
- Dishon, G., & Gilead, T. (2021). Adaptability and its discontents: 21st-century skills and the preparation for an unpredictable future. British journal of educational studies, 69(4), 393-413. https://doi.org/10.1080/00071005.2020.1829545
- Dogan, C. D., & Uluman, M. (2017). A Comparison of rubrics and graded category rating scales with various methods regarding raters' reliability. Educational sciences: Theory and practice, 17(2), 631-651. https://doi.org/10.12738/estp.2017.2.0321
- Donnon, T., McIlwrick, J., & Woloschuk, W. (2013). Investigating the reliability and validity of self and peer assessment to measure medical students’ professional competencies. Creative education, 4(6A), 23-28. https://doi.org/10.4236/ce.2013.46A005
- Duban, N., & Kucukyilmaz, E. A. (2008). Classroom teacher candidates' views on the use of alternative assessment techniques in application schools. Elementary education online, 7(3), 769-784.
- Dunn, K. E., & Mulvenon, S. W. (2009). A critical review of research on formative assessments: The limited scientific evidence of the impact of formative assessments in education. Practical assessment, research, and evaluation, 14(1), 1-11. https://doi.org/10.7275/jg4h-rb87
- Engelhard Jr, G., & Myford, C. M. (2003). Monitoring faculty consultant performance in the advanced placement English literature and composition program with a many‐faceted Rasch model. ETS Research Report Series, 2003(1), i-60. https://doi.org/10.1002/j.2333-8504.2003.tb01893.x
- Erman-Aslanoglu, A. & Sata, M. (2023). Examining the rater drift in the assessment of presentation skills in secondary school context. Journal of measurement and evaluation in education and psychology, 14(1), 62-75. https://doi.org/10.21031/epod.1213969
- Erman-Aslanoglu, A. (2017). Evaluation of an individual within a group: Peer and self-assessment. Bogazici university journal of education, 34(2), 35-50.
- Erman-Aslanoglu, A. (2022). Examining the effects of peer and self-assessment practices on writing skills. International journal of assessment tools in education, 9(Special Issue), 179-196. https://doi.org/10.21449/ijate.1127815
- Falchikov, N. (1995). Peer feedback marking: Developing peer assessment. Innovations in Education and training International, 32(2), 175-187. https://doi.org/10.1080/1355800950320212
- Farrokhi, F., Esfandiari, R., & Dalili, M. V. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World applied sciences journal, 15(11), 70-77.
- Farrokhi. F., Esfandiari. R., & Schaefer. E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT journal, 34(1). 79-101. https://doi.org/10.37546/JALTJJ34.1-3
- Gelbal, S., & Kelecioglu, H. (2007). Teacher competency perceptions and problems encountered in measurement and evaluation methods. Hacettepe university journal of education, (33), 135-145.
- Gocer, A., Arslan, S., & Cayli, C. (2017). Process-oriented complementary assessment tools and methods for determining student development in Turkish education. Suleyman Demirel university journal of social sciences institute, (28), 263-292.
- Gomleksiz, M. N., Yetkiner, A., & Yildirim, F. (2011). Teachers’ views on the use of alternative assessment and evaluation techniques in life studies class. Education sciences, 6(1), 823-840.
- Guler, N. (2012). Measurement and assessment in education. Pegem Akademi Publishing. https://doi.org/10.14527/9786053641247
- Hafner, J., & Hafner, P. (2003). Quantitative analysis of the rubric as an assessment tool: an empirical study of student peer‐group rating. Int. J. Sci. Educ., 25(12), 1509-1528. https://doi.org/10.1080/0950069022000038268
- Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge. https://doi.org/10.4324/9780203850381
- Hamayan, E. V. (1995). Approaches to alternative assessment. Annual review of applied linguistics, 15, 212-226. https://doi.org/10.1017/S0267190500002695
- Hambleton. R. K., Swaminathan. H., & Rogers. H. J. (1991). Fundamentals of item response theory. SAGE Publications.
- Harik, P., Clauser, B. E., Grabovsky, I., Nungester, R. J., Swanson, D., & Nandakumar, R. (2009). An examination of rater drift within a generalizability theory framework. Journal of educational measurement, 46(1), 43-58. https://doi.org/10.1111/j.1745-3984.2009.01068.x
- Hoskens, M., & Wilson, M. (2001). Real‐time feedback on rater drift in constructed‐response items: An example from the golden sate examination. Journal of educational measurement, 38(2), 121-145. https://doi.org/10.1111/j.1745-3984.2001.tb01119.x
- Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it?. Psychological methods, 5(1), 64–86. https://doi.org/10.1037/1082-989X.5.1.64
- Karakaya, I. (2015). Comparison of self, peer and instructor assessments in the portfolio assessment by using many facet Rasch model. Journal of education and human development, 4(2), 182-192. https://doi.org/10.15640/jehd.v4n2a22
- Kassim. A.N.L. (2007, June). Exploring rater judging behaviour using the many-facet Rasch model. Paper presented in the second biennial international conference on teaching and learning of english in asia: Exploring new frontiers (TELiA2). Universiti Utara. Malaysia.
- Kilic, D., & Gunes, P. (2016). Self, peer, and teacher assessment with grading rubrics. Mehmet Akif Ersoy university journal of education faculty, 1(39), 58-69. https://doi.org/10.21764/efd.93792
- Kim. Y., Park. I., & Kang. M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted physical activity quarterly, 29(4). 346-365. https://doi.org/10.1123/apaq.29.4.346
- Kooken, J., Welsh, M. E., McCoach, D. B., Miller, F. G., Chafouleas, S. M., Riley-Tillman, T. C., & Fabiano, G. (2017). Test order in teacher-rated behavior assessments: Is counterbalancing necessary?. Psychological assessment, 29(1), 98-109. https://doi.org/10.1037/pas0000314
- Kosterelioglu, İ., & Celen, Ü. (2016). Evaluation of the effectiveness of self-assessment method. Ilkogretim online, 15(2), 671-681. https://doi.org/10.17051/io.2016.44304
- Koyuncu, M. S. & Sata, M. (2023). Using ACER ConQuest program to examine multidimensional and many-facet models. International journal of assessment tools in education, 10(2), 279-302. https://doi.org/10.21449/ijate.1238248
- Kutlu, O., Dogan, C.D., & Karakaya, I., (2010). Determination of student achievement: Performance-based and portfolio-based authentic assessment and evaluation practices. Pegem Akademi Publishing.
- Lamprianou, I. (2006). The stability of marker characteristics across tests of the same subject and across subjects. Journal of applied measurement, 7(2), 192-205.
- Leckie, G., & Baird, J. A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of educational measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
- Linacre. J. M. (1996). Generalizability theory and many-facet Rasch measurement. Objective measurement: Theory into practice, 3, 85-98.
- Linacre. J.M. (2017). A user's guide to FACETS: Rasch-model computer programs. MESA Press.
- Linn, R. L. (2008). Measurement and assessment in teaching. Pearson Education
- Maier, A., Adams, J., Burns, D., Kaul, M., Saunders, M., & Thompson, C. (2020). Using performance assessments to support student learning: how district ınitiatives can make a difference. performance assessment case study series. Learning policy institute. i-68. Palo Alto. https://doi.org/10.54300/213.365
- McLaughlin, K., Ainslie, M., Coderre, S., Wright, B., & Violato, C. (2009). The effect of differential rater function over time (DRIFT) on objective structured clinical examination ratings. Medical education, 43(10), 989-992. https://doi.org/10.1111/j.1365-2923.2009.03438.x
- McNamara, T. F., & Adams, R. J. (1991). Exploring rater characteristics with Rasch techniques. In Selected papers of the 13th Language Testing Research Colloquium (LTRC). Educational Testing Service, International Testing and Training Program Office.
- Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist, 50(9), 741-749. https://doi.org/10.1037/0003-066X.50.9.741
- Modarresi, G., Jalilzadeh, K., Coombe, C., & Nooshab, A. (2021). Validating a test to measure translation teachers' assessment literacy. Journal of Asia TEFL, 18(4), 1503-1511. https://doi.org/10.18823/asiatefl.2021.18.4.31.1503
- Mulqueen, C., Baker, D., & Dismukes, R. K. (2000, April). Using multifacet Rasch analysis to examine the effectiveness of rater training. In 15th Annual Conference for the Society for Industrial and Organizational Psychology. https://doi.org/10.1037/e540522012-001
- Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of educational measurement, 46(4), 371-389. https://doi.org/10.1111/j.1745-3984.2009.00088.x
- Nalbantoglu Yilmaz, F. (2017). Analysis of the rater effects on the scoring of diagnostic trees prepared by teacher candidates with the many-facet Rasch model. Online submission, 8(18), 174-184. https://doi.org/10.15345/iojes.2016.02.020
- National Research Council. (2001). Classroom assessment and the national science education standards. National Academies Press.
- Noonan, B., & Duncan, C. R. (2005). Peer and self-assessment in high schools. Practical assessment, research, and evaluation, 10(1), 1-8. https://doi.org/10.7275/a166-vm41
- Oren, F. S., Ormanci, U., & Evrekli, E. (2014). The alternative assessment-evaluation approaches preferred by pre-service teachers and their self-efficacy towards these approaches. Educational sciences: Theory & practice, 11(3), 1690-1698.
- Orlova, N. (2019). Student peer performance evaluation: importance of implementation for group work enhancement. Science and education a new dimension: Pedagogy and psychology, 26-29. https://doi.org/10.31174/SEND-PP2019-207VII84-05
- Ozpinar, I. (2021). Self, peer, group, and instructor assessment: A glimpse through the window of teacher competencies. Cumhuriyet international journal of education, 10(3), 949-973. https://doi.org/10.30703/cije.754885
- Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the literature. Practical assessment, research, and evaluation, 13(4), 1-11. https://doi.org/10.7275/0qpc-ws45
- Park, Y. S. (2011). Rater drift in constructed response scoring via latent class signal detection theory and item response theory. Columbia University. https://doi.org/10.7916/D8445TGR
- Petra, T. Z. H. T., & Ab Aziz, M. J. (2020, April). Investigating reliability and validity of student performance assessment in higher education using Rasch model. In Journal of Physics: Conference Series 1529(4), 042088. IOP Publishing. https://doi.org/10.1088/1742-6596/1529/4/042088
- Quellmalz, E. (1980). Problems in stabilizing the judgment process (Vol. CSE Report No. 136). Center for the Study of Evaluation.
- Raju. N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied psychological measurement, 14(2), 197-207. https://doi.org/10.1177/014662169001400208
- Raymond, M. R., Harik, P., & Clauser, B. E. (2011). The impact of statistically adjusting for rater effects on conditional standard errors of performance ratings. Applied Psychological measurement, 35(3), 235-246. https://doi.org/10.1177/0146621610390675
- Rennert-Ariev, P. (2005). A theoretical model for the authentic assessment of teaching. Practical assessment, research, and evaluation, 10(2), 1-12. https://doi.org/10.7275/a7h7-4111
- Sad, S. N., & Goktas, O. (2013). Examination of traditional and contemporary measurement and evaluation approaches of academic staff. Ege education journal, 14(2), 79-105.
- Sata, M. & Karakaya, I. (2022). Investigating the impact of rater training on rater errors in the process of assessing writing skill. International journal of assessment tools in education, 9(2), 492-514. https://doi.org/10.21449/ijate.877035
- Sata, M. (2020a). Quantitative research approaches. In E. Oğuz (Ed.), Research methods in education (pp. 77-98). Egiten Kitap Publications.
- Sata, M. (2020b, November). Evaluation of university students' oral presentation skills by their peers. 13th International Education Community Symposium. Online. Turkey.
- Shepard, L. A. (2000). The role of assessment in a learning culture. Educational researcher, 29(7), 4-14. https://doi.org/10.3102/0013189X029007004
- Swaminathan. H., & Rogers. H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
- Szőkol, I., Szarka, K., & Hargaš, J. (2022). The functions of educational evaluation. R&Amp;E-SOURCE, (S24). https://doi.org/10.53349/resource.2022.iS24.a1112
- Tunkler, V. (2019). Investigation of the contribution of peer assessment to pre-service teachers' professional knowledge and skills. Marmara university Atatürk education faculty journal of educational sciences, 50(50), 206-221. https://doi.org/10.15285/maruaebd.525171
- Uto, M. (2022). A Bayesian many-facet Rasch model with Markov modeling for rater severity drift. Behavior research methods, 1-19. https://doi.org/10.3758/s13428-022-01997-z
- Uzun, A., & Yurdabakan, I. (2011). An investigation of elementary school students' attitudes towards self-assessment. Mehmet Akif Ersoy university journal of education faculty, 11(22), 51-69.
- Wayda, V., & Lund, J. (2005). Assessing dispositions: An unresolved challenge in teacher education. Journal of physical education, recreation & dance, 76(1), 34-41. https://doi.org/10.1080/07303084.2005.10607317
- Wesolowski, B. C., Wind, S. A., & Engelhard Jr, G. (2017). Evaluating differential rater functioning over time in the context of solo music performance assessment. Bulletin of the council for research in music education, (212), 75-98. https://doi.org/10.5406/bulcouresmusedu.212.0075
- Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian review of applied linguistics, 17(2), 77-103. https://doi.org/10.1075/aral.17.2.04wig
- Wolfe, E. W., Moulder, B. C., & Myford, C. M. (1999, April). Detecting differential rater functioning over time (DRIFT) using a Rasch multi-faceted rating scale model. Annual Meeting of the American Educational Research Association. Montreal, Quebec, Canada.
- Wolfe, E. W., Myford, C. M., Engelhard Jr, G., & Manalo, J. R. (2007). Monitoring reader performance and DRIFT in the AP® English literature and composition examination using benchmark essays. Research Report No. 2007-2. College Board.
- Yildiz, S. (2018). Developing a self-assessment scale for fractions. Mustafa Kemal university journal of faculty of education, 2(3), 30-44.
- Yurdabakan, I. (2012). The effect of peer and collaborative assessment training on pre-service teachers’ self-assessment skills. Education and science, 37(163), 190-202.
- Zhu, W., & Cole, E. L. (1996). Many-faceted Rasch calibration of a gross motor instrument. Research quarterly for exercise and sport, 67(1), 24-34. https://doi.org/10.1080/02701367.1996.10607922