Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment

Emine Yavuz; Mehmet Şata

Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment

Abstract

This research focused on the role of self-, peer, and instructor assessments in assessing pre-service teachers’ competencies in open-ended item development and aimed to determine whether there were rater biases in the scorings. Besides, it focused on the differences in the scores of pre-service teachers according to their gender and education type. Participants who 116 pre-service students were asked to prepare one open-ended item in the measurement and evaluation course as a performance task. Self and peers scored the items, and the instructor did so through a holistic rubric. Many facets of Rasch were used to analyze the data. Analysis showed that self-assessment was the most lenient rater type while instructor assessment had the most severe ratings, and there was no rater bias in the scoring. Besides, female pre-service teachers were more lenient than male pre-service teachers, and pre-service teachers studying in daytime classes were more lenient than those in evening classes.

Keywords

References

Adesiji, K. M., Agbonifo, O. C., Adesuyi, A. T., & Olabode, O. (2016). Development of an automated descriptive text-based scoring system. British Journal of Mathematics & Computer Science, 19(4), 1-14. https://doi.org/10.9734/BJMCS/2016/27558
Almazroa, H. & Alotaibi, W. (2023). Teaching 21st century skills: Understanding the depth and width of the challenges to shape proactive teacher education programmes. Sustainability, 15, 7365. https://doi.org/10.3390/su15097365
Alver, B. (2005). The emphatic skills and decision-making strategies of the students of the department of guidance and psychological counseling, faculty of education were studied. Journal of Social Science and Humanities Researches, (14), 19-34.
Anderson, L. W., & Krathwohl, D. R. (Eds.) (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Pearson.
Ateş, G. Ç. & Köse, M. F. (2024). An analysis of university students’ academic achievement in relation to accommodation and various dependent variables. Journal of University Research, 7(3), 212-223. https://doi.org/10.32329/uad.1500037
Bandura, A. (1986). Social Foundations of Thought and Action: A Social Cognitive Theory. Prentice-Hall.
Barrette, C. (2004). An analysis of foreign language achievement test drafts. Foreign Language Annals, 37(1), 58–70. https://doi.org/10.1111/j.1944-9720.2004.tb02173.x.
Bastarrica, M. C., & Simmonds, J. (2019). Gender differences in self and peer assessment in a software engineering capstone course. IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering, Montreal, CA, May 2019. https://doi.org/10.1109/GE.2019.00014

Birenbaum, M., Tatsuoka, K. & Gutvirtz, Y. (1992). Effects of response format on diagnostic assessment of scholastic achievement. Applied Psych. Measurement, 16(4), 353-363. https://doi.org/10.1177/0146621692016004
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). Routledge.
Brookhards, S. (2014). How to design questions and tasks to assess student thinking. ASCD.
Cabello, V. M., & Topping, K. J. (2020). Peer assessment of teacher performance. What works in teacher education? International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 8(2), 121-132. https://doi.org/10.5937/IJCRSEE2002121C
De Marsico, M., Sciarrone, F., Sterbini, A., & Temperini, M. (2017). Supporting mediated peer-evaluation to grade answers to open-ended questions. EURASIA Journal of Mathematics Science and Technology Education, 13(4), 1085-1106. https://doi.org/10.12973/eurasia.2017.00660a
Demir, M. K. (2012). Analyzing empathy skills of primary school teacher candidates. Buca Faculty of Education Journal, (33), 107-121.
Douchy, F., Segers, M., & Sluijsmans, D. (1999). The use of self- peer and coassessment in higher education. Studies in Higher Education, 24(3), 331-350. https://doi.org/10.1080/03075079912331379935
Ebel, R.L., & Frisbie, D.A. (1991). Essentials of educational measurement. Prentice Hall.
Eckes, T. (2015). Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments. Peter Lang.
Eke, C. (2018). Analysis of objectives of high school physics curriculum according to the revised Bloom's taxonomy. Journal of Social Research and Behavioral Sciences, 4(6), 69-84.
Ercoşkun, M. H., & Nalçacı, A. (2008). The investigation of the empathic skills and democratic attitudes of the primary school teacher candidates. Milli Eğitim Dergisi, 37(180), 204-215.
Ercoşkun, M. H., Dilekmen, M., Ada, Ş., & Nalçacı, A. (2006). The investigation of empathic skills of the department of primary school teaching students as regards individual variations. Educational Academic Research, (13), 207–217.
Erkayıran, O., Şenocak, S. Ü., & Demirkıran, F. (2018). Investigation of empathic skill levels of nursing students in terms of some variables: A cross-sectional study. Journal of Nursing Science, 1(2), 01–04.
Erman-Aslanoglu, A., Karakaya, İ., Sata, M. (2020). Evaluation of university students’ rating behaviors in self and peer rating process via many facet rasch model. Eurasian Journal of Educational Research, 20(89), 25-46. https://izlik.org/JA58KH69DL
Falchikov, N., & D. Boud. (1989). Student self-assessment in higher education: A meta-analysis. Review of Educational Research, 59(4), 395–430. https://doi.org/10.2307/1170205
Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education. A meta-analysis comparing peer and teacher marks. Review of Educational Research, 70(3), 287-322. https://doi.org/10.2307/1170785
Fang, J.-W., Chang, S.-C., Hwang, G.-J., & Yang, G. (2021). An online collaborative peer‑assessment approach to strengthening pre-service teachers' digital content development competence and higher‑order thinking tendency. Education Tech Research Dev, 69, 1155–1181. https://doi.org/10.1007/s11423-021-09990-7
Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101.
Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83.
Fraenkel, J. R., & Wallen, N. E. (2005). How to design and evaluate research in education. McGraw-Hill.
Genç, S. Z., & Kalafat, T. (2010). Prospective teachers’ problem solving skills and emphatic skills. Journal of Theoretical Educational Science, 3(2), 135-147.
Gielen, S., Dochy, F., & Onghena, P. (2010). An inventory of peer assessment diversity. Assessment and Evaluation in Higher Education 36(2), 137-155. https://doi.org/10.1080/02602930903221444
Gierl, M. J., Latifi, S., Lai, H., Boulais, A. P., & Champlain, A. D. (2014). Automated essay scoring and the future of educational assessment in medical education. Medical Education, 48, 950-962. https://doi.org/10.1111/medu.12517
Goodrich, H. (1997). Understanding Rubrics: The dictionary may define "rubric", but these models provide more clarity. Educational Leadership, 54(4), 14-17.
Heritage, M. (2007). Formative assessment: What do teachers need to know and do? Phi Delta Kappan. https://kappanonline.org/formative-assessment-heritage/
Kane, J., Bernardin, H., Villanueva, J., & Peyrefitte, J. (1995). Stability of rater leniency: Three studies. Academy of Management Journal, 38(4), 1036-1051. https://doi.org/10.2307/256619
Kane, L. S. & Lawler, E. E. (1978). Methods of peer assessment. Psych. Bull., 85(3), 555-586. https://doi.org/10.1037/0033-2909.85.3.555
Karakaya, İ. (2015). Comparison of self, peer, and instructor assessments in the portfolio assessment by using the many-facet RASCH model. Journal of Education and Human Development 4(2), 182-192. https://doi.org/10.15640/jehd.v4n2a22
Kim, Y., Park, I., & Kang, M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted Physical Activity Quarterly, 29(4), 346-365. https://doi.org/10.1123/apaq.29.4.346
Knoch, U., Read, J., & von Randow, T. (2007). Re-training writing raters online: How does compare with face-to-face training? Assessing Writing, 12(2), 26-43. https://doi.org/10.1016/j.asw.2007.04.001
Kylonen, P. C. (2012). Measurement of 21st century skills within the common core state standards. Paper presented at the Invitational Research Symposium on Technology Enhanced Assessments, USA, May 2012.
La Velle, L. (2019). The theory–practice nexus in teacher education: New evidence for effective approaches. Journal of Education for Teaching, 45(4), 369-372. https://doi.org/10.1080/02607476.2019.1639267
Lejk, M. & Wyvill, M. (2001). The effect of the inclusion of self-assessment with peer-assessment of contributions to a group project: A Quantitative study of secret and agreed assessments. Assessment and Evaluation in Higher Education, 26(6), 551–61. https://doi.org/10.1080/02602930120093887
Leonard, D. K., & Jiang, J. (1999). Gender bias and the college predictors of the SATs: A cry of Despair. Research in Higher Education, 40(4), 375-407.
Li, H., Xiong, Y., Hunter, C. V., Guo, X., & Tywoniw, R. (2020). Does peer assessment promote student learning? A meta-analysis. Assessment and Evaluation in Higher Education, 45(2), 193-211. https://doi.org/10.1080/02602938.2019.1620679
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
Linacre, J. M. (2012). FACETS (Version 3.70.1) [Computer Software]. https://www.winsteps.com/facgood.htm
Linacre, J. M. (2017). FACETS (Version 3.80.0) [Computer Software]. https://www.winsteps.com/facgood.htm
Main, J. B., & Sánchez-Peña, M. (2015). Student evaluations of team members: Is there gender bias? Paper presented at IEEE Frontiers in Education Conference (FIE), TX, USA, October, 2015. https://doi.org/10.1109/FIE.2015.7344177.
Maslach, C., & Jackson, S. E. (1981). The measurement of experienced burnout. Journal of Occupational Behavior, 2(2), 99–113. https://doi.org/10.1002/job.4030020205
Mertler, C. A. (2016). Classroom assessment: A practical guide for educators. Routledge.
Ministry of National Education (2017). Fen bilimleri dersi öğretim programı (ilkokul ve ortaokul 3, 4, 5, 6, 7 ve 8 .sınıflar) [Science course curriculum (primary and secondary school 3rd, 4th, 5th, 6th, 7th and 8th grades)]. Ankara, Turkey
Mumpuni, K. E., Priyayi, D. F., & Widoretno, S. (2022). How do students perform a peer assessment? International Journal of Instruction, 15(3), 751-766. https://doi.org/10.29333/iji.2022.15341a
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
Myford, C.M., & Wolfe, E.W. (2000). Strengthening the ties that bind: Improving the linking network in sparsely connected rating designs. Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2000.tb01832.x
Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Nilsson, P. (2013). What do we know and where do we go? Formative assessment in developing student teachers’ professional learning of teaching science. Teachers and Teaching, 19(2), 188-201. https://doi.org/10.1080/13540602.2013.741838
Oluwatayo, J. A., & Adebule, S. O. (2012). Assessment of teaching performance of student-teachers on teaching practice. International Education Studies, 5(5), 109-115. https://doi.org/10.5539/ies.v5n5p109
Osman, S. (2021). Basic school teachers’ assessment practices in the sissala east municipality, Ghana. European Journal of Education Studies, 8(7), 44-74. https://doi.org/10.46827/ejes.v8i7.3801
Palmer, K., & Richardson, P. (2003). On-line assessment and free-response input-a pedagogic and technical model for squaring the circle. Paper presented at Proc. 7th CAA Conference, Loughborough, UK, December 2003.
Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory Into Practice, 48(1), 4–11. https://doi.org/10.1080/00405840802577536
Rahmawati, Y., Ridwan, A., Hadinugrahaningsih, T., & Soeprijanto. (2019, January). Developing critical and creative thinking skills through STEAM integration in chemistry learning. In Journal of Physics: Conference Series (Vol. 1156, p. 012033). IOP Publishing.
Sadler, P.M. & Good, E. (2006). The Impact of Self- and Peer-Grading on Student Learning. Educational Assessment, 11(1), 1-31.
Sari, D.K., Dinata, P. A. C., & Uspayanti, R. (2022). The teachers’ competencies to develop assessments for high school students in Merauke. Ishlah: Jurnal Pendidikan, 14(3), 3199 – 3206. https://doi.org/10.35445/alishlah.v14i
Sasmaz-Oren, F. (2012) The effects of gender and previous experience on the approach of self and peer assessment: a case from Turkey. Innovations in Education and Teaching International, 49(2), 123-133. https://doi.org/10.1080/14703297.2012.677598
Şata, M. (2022). Açık uçlu sorular. In İ. Karakaya (Edt.), Açık uçlu soruların hazırlanması uygulanması ve değerlendirilmesi, 1-11. Pegem.
Şata, M., & Karakaya, İ. (2021). Investigating the effect of rater training on differential rater function in assessing academic writing skills of higher education students. Journal of Measurement and Evaluation in Education and Psychology, 12(2), 163-181. doi: 10.21031/epod.842094
Shen, B., & Bai, B. (2019). Facilitating university teachers’ continuing professional development through peer-assisted research and implementation teamwork in China. Journal of Education for Teaching, 45(4), 476-480. https://doi.org/10.1080/02607476.2019.1639265
Soland, J., Hamilton, L. S., & Stecher, B. M. (2013). Measuring 21st century competencies: Guidance for educators. RAND Corporation.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4
Takeda, S., & Homberg, F. (2014). The effects of gender on group work process and achievement: An analysis through self- and peer-assessment. British Educational Research Journal, 40(2), 373–396. https://doi.org/10.1002/berj.3088
Taş, U. E., Arıcı, Ö., Ozarkan, H. B., & Özgürlük, B. (2016). PISA 2015 ulusal raporu [PISA 2015 national report]. Ankara, Turkey: Milli Eğitim Bakanlığı Yayınları.
Torres-Guijarro, S., & Bengoechea, M. (2017). Gender differential in self-assessment: A fact neglected in higher education peer and self-assessment techniques. Higher Education Research and Development, 36(5), 1072-1084. https://doi.org/10.1080/07294360.2016.1264372
van Zundert, M., Sluijsmans, D., & van Merriënboer, J. J. G. (2010). Effective peer assessment processes: Research findings and future directions. Learning and Instruction, 20(4), 270-279. https://doi.org/10.1016/j.learninstruc.2009.08.004
van-Trieste, R. F. (1990). The relation between Puerto Rican university students’ attitudes toward Americans and the students’ achievement in English as a second language. Homines, (13–14), 94–112.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.
Wainer, H., & Steinberg, L. S. (1992) Sex differences in performance on the mathematics section of the scholastic aptitude test: A bidirectional validity study. Harvard Educational Review, 62(3), 323-336. https://doi.org/10.17763/haer.62.3.1p1555011301r133
Wen, M. L., & Tsai, C. (2008). Online peer assessment in an in-service science and mathematics teacher education course. Teaching in Higher Education, 13(1), 55-67. https://doi.org/10.1080/13562510701794050
Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197-210. https://doi.org/10.1177/07481756124402
Winke, P., Gass, S., & Myford, C. (2013). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30(2), 231-252. https://doi.org/10.1177/0265532212456968
Wolfe, E. W., & McVay, A. (2012). Application of latent trait models to identify raters exhibiting score scale usage problems. Applied Measurement in Education, 25(2), 125–143.
Yaz, Ö., & Kurnaz, M. (2017). The Examination of 2013 Science Curricula. International Journal of Turkish Education Science, 2017(8), 173-184.
Yenen, E. T. (2021). Prospective teachers’ professional skill needs: A Q method analysis. Teacher Development, 25(2), 196–214. https://doi.org/10.1080/13664530.2021.1877188
Yeşilçınar, S., & Şata, M. (2021). Examining Rater Biases of Peer Assessors in Different Assessment Environments. International Journal of Psychology and Educational Studies, 8(4), 136-151. https://izlik.org/JA45TW35KZ

Details

Primary Language

English

Subjects

Item Response Theory

Journal Section

Research Article

Authors

Emine Yavuz ^*
0000-0002-1991-1416
Türkiye

Mehmet Şata
0000-0003-2683-4997
Türkiye

Publication Date

April 1, 2026

Submission Date

August 6, 2025

Acceptance Date

March 9, 2026

Published in Issue

Year 2026 Volume: 17 Number: 1

IZ

https://izlik.org/JA55GA56YR

Cite

RIS / Bibtex

APA

Yavuz, E., & Şata, M. (2026). Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment. Journal of Measurement and Evaluation in Education and Psychology, 17(1), 24-41. https://izlik.org/JA55GA56YR

AMA

1.Yavuz E, Şata M. Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment. JMEEP. 2026;17(1):24-41. https://izlik.org/JA55GA56YR

Chicago

Yavuz, Emine, and Mehmet Şata. 2026. “Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment”. Journal of Measurement and Evaluation in Education and Psychology 17 (1): 24-41. https://izlik.org/JA55GA56YR.

EndNote

Yavuz E, Şata M (April 1, 2026) Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment. Journal of Measurement and Evaluation in Education and Psychology 17 1 24–41.

IEEE

[1]E. Yavuz and M. Şata, “Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment”, JMEEP, vol. 17, no. 1, pp. 24–41, Apr. 2026, [Online]. Available: https://izlik.org/JA55GA56YR

ISNAD

Yavuz, Emine - Şata, Mehmet. “Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment”. Journal of Measurement and Evaluation in Education and Psychology 17/1 (April 1, 2026): 24-41. https://izlik.org/JA55GA56YR.

JAMA

1.Yavuz E, Şata M. Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment. JMEEP. 2026;17:24–41.

MLA

Yavuz, Emine, and Mehmet Şata. “Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment”. Journal of Measurement and Evaluation in Education and Psychology, vol. 17, no. 1, Apr. 2026, pp. 24-41, https://izlik.org/JA55GA56YR.

Vancouver

1.Emine Yavuz, Mehmet Şata. Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment. JMEEP [Internet]. 2026 Apr. 1;17(1):24-41. Available from: https://izlik.org/JA55GA56YR