Research Article
BibTex RIS Cite

Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment

Year 2026, Volume: 17 Issue: 1 , 24 - 41 , 01.04.2026
https://izlik.org/JA55GA56YR

Abstract

This research focused on the role of self-, peer, and instructor assessments in assessing pre-service teachers’ competencies in open-ended item development and aimed to determine whether there were rater biases in the scorings. Besides, it focused on the differences in the scores of pre-service teachers according to their gender and education type. Participants who 116 pre-service students were asked to prepare one open-ended item in the measurement and evaluation course as a performance task. Self and peers scored the items, and the instructor did so through a holistic rubric. Many facets of Rasch were used to analyze the data. Analysis showed that self-assessment was the most lenient rater type while instructor assessment had the most severe ratings, and there was no rater bias in the scoring. Besides, female pre-service teachers were more lenient than male pre-service teachers, and pre-service teachers studying in daytime classes were more lenient than those in evening classes.

References

  • Adesiji, K. M., Agbonifo, O. C., Adesuyi, A. T., & Olabode, O. (2016). Development of an automated descriptive text-based scoring system. British Journal of Mathematics & Computer Science, 19(4), 1-14. https://doi.org/10.9734/BJMCS/2016/27558
  • Almazroa, H. & Alotaibi, W. (2023). Teaching 21st century skills: Understanding the depth and width of the challenges to shape proactive teacher education programmes. Sustainability, 15, 7365. https://doi.org/10.3390/su15097365
  • Alver, B. (2005). The emphatic skills and decision-making strategies of the students of the department of guidance and psychological counseling, faculty of education were studied. Journal of Social Science and Humanities Researches, (14), 19-34.
  • Anderson, L. W., & Krathwohl, D. R. (Eds.) (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Pearson.
  • Ateş, G. Ç. & Köse, M. F. (2024). An analysis of university students’ academic achievement in relation to accommodation and various dependent variables. Journal of University Research, 7(3), 212-223. https://doi.org/10.32329/uad.1500037
  • Bandura, A. (1986). Social Foundations of Thought and Action: A Social Cognitive Theory. Prentice-Hall.
  • Barrette, C. (2004). An analysis of foreign language achievement test drafts. Foreign Language Annals, 37(1), 58–70. https://doi.org/10.1111/j.1944-9720.2004.tb02173.x.
  • Bastarrica, M. C., & Simmonds, J. (2019). Gender differences in self and peer assessment in a software engineering capstone course. IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering, Montreal, CA, May 2019. https://doi.org/10.1109/GE.2019.00014
  • Birenbaum, M., Tatsuoka, K. & Gutvirtz, Y. (1992). Effects of response format on diagnostic assessment of scholastic achievement. Applied Psych. Measurement, 16(4), 353-363. https://doi.org/10.1177/0146621692016004
  • Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). Routledge.
  • Brookhards, S. (2014). How to design questions and tasks to assess student thinking. ASCD.
  • Cabello, V. M., & Topping, K. J. (2020). Peer assessment of teacher performance. What works in teacher education? International Journal of Cognitive Research in Science, Engineering and Education (IJCRSEE), 8(2), 121-132. https://doi.org/10.5937/IJCRSEE2002121C
  • De Marsico, M., Sciarrone, F., Sterbini, A., & Temperini, M. (2017). Supporting mediated peer-evaluation to grade answers to open-ended questions. EURASIA Journal of Mathematics Science and Technology Education, 13(4), 1085-1106. https://doi.org/10.12973/eurasia.2017.00660a
  • Demir, M. K. (2012). Analyzing empathy skills of primary school teacher candidates. Buca Faculty of Education Journal, (33), 107-121.
  • Douchy, F., Segers, M., & Sluijsmans, D. (1999). The use of self- peer and coassessment in higher education. Studies in Higher Education, 24(3), 331-350. https://doi.org/10.1080/03075079912331379935
  • Ebel, R.L., & Frisbie, D.A. (1991). Essentials of educational measurement. Prentice Hall.
  • Eckes, T. (2015). Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments. Peter Lang.
  • Eke, C. (2018). Analysis of objectives of high school physics curriculum according to the revised Bloom's taxonomy. Journal of Social Research and Behavioral Sciences, 4(6), 69-84.
  • Ercoşkun, M. H., & Nalçacı, A. (2008). The investigation of the empathic skills and democratic attitudes of the primary school teacher candidates. Milli Eğitim Dergisi, 37(180), 204-215.
  • Ercoşkun, M. H., Dilekmen, M., Ada, Ş., & Nalçacı, A. (2006). The investigation of empathic skills of the department of primary school teaching students as regards individual variations. Educational Academic Research, (13), 207–217.
  • Erkayıran, O., Şenocak, S. Ü., & Demirkıran, F. (2018). Investigation of empathic skill levels of nursing students in terms of some variables: A cross-sectional study. Journal of Nursing Science, 1(2), 01–04.
  • Erman-Aslanoglu, A., Karakaya, İ., Sata, M. (2020). Evaluation of university students’ rating behaviors in self and peer rating process via many facet rasch model. Eurasian Journal of Educational Research, 20(89), 25-46. https://izlik.org/JA58KH69DL
  • Falchikov, N., & D. Boud. (1989). Student self-assessment in higher education: A meta-analysis. Review of Educational Research, 59(4), 395–430. https://doi.org/10.2307/1170205
  • Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education. A meta-analysis comparing peer and teacher marks. Review of Educational Research, 70(3), 287-322. https://doi.org/10.2307/1170785
  • Fang, J.-W., Chang, S.-C., Hwang, G.-J., & Yang, G. (2021). An online collaborative peer‑assessment approach to strengthening pre-service teachers' digital content development competence and higher‑order thinking tendency. Education Tech Research Dev, 69, 1155–1181. https://doi.org/10.1007/s11423-021-09990-7
  • Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101.
  • Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83.
  • Fraenkel, J. R., & Wallen, N. E. (2005). How to design and evaluate research in education. McGraw-Hill.
  • Genç, S. Z., & Kalafat, T. (2010). Prospective teachers’ problem solving skills and emphatic skills. Journal of Theoretical Educational Science, 3(2), 135-147.
  • Gielen, S., Dochy, F., & Onghena, P. (2010). An inventory of peer assessment diversity. Assessment and Evaluation in Higher Education 36(2), 137-155. https://doi.org/10.1080/02602930903221444
  • Gierl, M. J., Latifi, S., Lai, H., Boulais, A. P., & Champlain, A. D. (2014). Automated essay scoring and the future of educational assessment in medical education. Medical Education, 48, 950-962. https://doi.org/10.1111/medu.12517
  • Goodrich, H. (1997). Understanding Rubrics: The dictionary may define "rubric", but these models provide more clarity. Educational Leadership, 54(4), 14-17.
  • Heritage, M. (2007). Formative assessment: What do teachers need to know and do? Phi Delta Kappan.  https://kappanonline.org/formative-assessment-heritage/
  • Kane, J., Bernardin, H., Villanueva, J., & Peyrefitte, J. (1995). Stability of rater leniency: Three studies. Academy of Management Journal, 38(4), 1036-1051. https://doi.org/10.2307/256619
  • Kane, L. S. & Lawler, E. E. (1978). Methods of peer assessment. Psych. Bull., 85(3), 555-586. https://doi.org/10.1037/0033-2909.85.3.555
  • Karakaya, İ. (2015). Comparison of self, peer, and instructor assessments in the portfolio assessment by using the many-facet RASCH model. Journal of Education and Human Development 4(2), 182-192. https://doi.org/10.15640/jehd.v4n2a22
  • Kim, Y., Park, I., & Kang, M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted Physical Activity Quarterly, 29(4), 346-365. https://doi.org/10.1123/apaq.29.4.346
  • Knoch, U., Read, J., & von Randow, T. (2007). Re-training writing raters online: How does compare with face-to-face training? Assessing Writing, 12(2), 26-43. https://doi.org/10.1016/j.asw.2007.04.001
  • Kylonen, P. C. (2012). Measurement of 21st century skills within the common core state standards. Paper presented at the Invitational Research Symposium on Technology Enhanced Assessments, USA, May 2012.
  • La Velle, L. (2019). The theory–practice nexus in teacher education: New evidence for effective approaches. Journal of Education for Teaching, 45(4), 369-372. https://doi.org/10.1080/02607476.2019.1639267
  • Lejk, M. & Wyvill, M. (2001). The effect of the inclusion of self-assessment with peer-assessment of contributions to a group project: A Quantitative study of secret and agreed assessments. Assessment and Evaluation in Higher Education, 26(6), 551–61. https://doi.org/10.1080/02602930120093887
  • Leonard, D. K., & Jiang, J. (1999). Gender bias and the college predictors of the SATs: A cry of Despair. Research in Higher Education, 40(4), 375-407.
  • Li, H., Xiong, Y., Hunter, C. V., Guo, X., & Tywoniw, R. (2020). Does peer assessment promote student learning? A meta-analysis. Assessment and Evaluation in Higher Education, 45(2), 193-211. https://doi.org/10.1080/02602938.2019.1620679
  • Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
  • Linacre, J. M. (2012). FACETS (Version 3.70.1) [Computer Software]. https://www.winsteps.com/facgood.htm
  • Linacre, J. M. (2017). FACETS (Version 3.80.0) [Computer Software]. https://www.winsteps.com/facgood.htm
  • Main, J. B., & Sánchez-Peña, M. (2015). Student evaluations of team members: Is there gender bias? Paper presented at IEEE Frontiers in Education Conference (FIE), TX, USA, October, 2015. https://doi.org/10.1109/FIE.2015.7344177.
  • Maslach, C., & Jackson, S. E. (1981). The measurement of experienced burnout. Journal of Occupational Behavior, 2(2), 99–113. https://doi.org/10.1002/job.4030020205
  • Mertler, C. A. (2016). Classroom assessment: A practical guide for educators. Routledge.
  • Ministry of National Education (2017). Fen bilimleri dersi öğretim programı (ilkokul ve ortaokul 3, 4, 5, 6, 7 ve 8 .sınıflar) [Science course curriculum (primary and secondary school 3rd, 4th, 5th, 6th, 7th and 8th grades)]. Ankara, Turkey
  • Mumpuni, K. E., Priyayi, D. F., & Widoretno, S. (2022). How do students perform a peer assessment? International Journal of Instruction, 15(3), 751-766. https://doi.org/10.29333/iji.2022.15341a
  • Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
  • Myford, C.M., & Wolfe, E.W. (2000). Strengthening the ties that bind: Improving the linking network in sparsely connected rating designs. Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2000.tb01832.x
  • Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Nilsson, P. (2013). What do we know and where do we go? Formative assessment in developing student teachers’ professional learning of teaching science. Teachers and Teaching, 19(2), 188-201. https://doi.org/10.1080/13540602.2013.741838
  • Oluwatayo, J. A., & Adebule, S. O. (2012). Assessment of teaching performance of student-teachers on teaching practice. International Education Studies, 5(5), 109-115. https://doi.org/10.5539/ies.v5n5p109
  • Osman, S. (2021). Basic school teachers’ assessment practices in the sissala east municipality, Ghana. European Journal of Education Studies, 8(7), 44-74. https://doi.org/10.46827/ejes.v8i7.3801
  • Palmer, K., & Richardson, P. (2003). On-line assessment and free-response input-a pedagogic and technical model for squaring the circle. Paper presented at Proc. 7th CAA Conference, Loughborough, UK, December 2003.
  • Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory Into Practice, 48(1), 4–11. https://doi.org/10.1080/00405840802577536
  • Rahmawati, Y., Ridwan, A., Hadinugrahaningsih, T., & Soeprijanto. (2019, January). Developing critical and creative thinking skills through STEAM integration in chemistry learning. In Journal of Physics: Conference Series (Vol. 1156, p. 012033). IOP Publishing.
  • Sadler, P.M. & Good, E. (2006). The Impact of Self- and Peer-Grading on Student Learning. Educational Assessment, 11(1), 1-31.
  • Sari, D.K., Dinata, P. A. C., & Uspayanti, R. (2022). The teachers’ competencies to develop assessments for high school students in Merauke. Ishlah: Jurnal Pendidikan, 14(3), 3199 – 3206. https://doi.org/10.35445/alishlah.v14i
  • Sasmaz-Oren, F. (2012) The effects of gender and previous experience on the approach of self and peer assessment: a case from Turkey. Innovations in Education and Teaching International, 49(2), 123-133. https://doi.org/10.1080/14703297.2012.677598
  • Şata, M. (2022). Açık uçlu sorular. In İ. Karakaya (Edt.), Açık uçlu soruların hazırlanması uygulanması ve değerlendirilmesi, 1-11. Pegem.
  • Şata, M., & Karakaya, İ. (2021). Investigating the effect of rater training on differential rater function in assessing academic writing skills of higher education students. Journal of Measurement and Evaluation in Education and Psychology, 12(2), 163-181. doi: 10.21031/epod.842094
  • Shen, B., & Bai, B. (2019). Facilitating university teachers’ continuing professional development through peer-assisted research and implementation teamwork in China. Journal of Education for Teaching, 45(4), 476-480. https://doi.org/10.1080/02607476.2019.1639265
  • Soland, J., Hamilton, L. S., & Stecher, B. M. (2013). Measuring 21st century competencies: Guidance for educators. RAND Corporation.
  • Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4
  • Takeda, S., & Homberg, F. (2014). The effects of gender on group work process and achievement: An analysis through self- and peer-assessment. British Educational Research Journal, 40(2), 373–396.   https://doi.org/10.1002/berj.3088
  • Taş, U. E., Arıcı, Ö., Ozarkan, H. B., & Özgürlük, B. (2016). PISA 2015 ulusal raporu [PISA 2015 national report]. Ankara, Turkey: Milli Eğitim Bakanlığı Yayınları.
  • Torres-Guijarro, S., & Bengoechea, M. (2017). Gender differential in self-assessment: A fact neglected in higher education peer and self-assessment techniques. Higher Education Research and Development, 36(5), 1072-1084. https://doi.org/10.1080/07294360.2016.1264372
  • van Zundert, M., Sluijsmans, D., & van Merriënboer, J. J. G. (2010). Effective peer assessment processes: Research findings and future directions. Learning and Instruction, 20(4), 270-279. https://doi.org/10.1016/j.learninstruc.2009.08.004
  • van-Trieste, R. F. (1990). The relation between Puerto Rican university students’ attitudes toward Americans and the students’ achievement in English as a second language. Homines, (13–14), 94–112.
  • Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.
  • Wainer, H., & Steinberg, L. S. (1992) Sex differences in performance on the mathematics section of the scholastic aptitude test: A bidirectional validity study. Harvard Educational Review, 62(3), 323-336. https://doi.org/10.17763/haer.62.3.1p1555011301r133
  • Wen, M. L., & Tsai, C. (2008). Online peer assessment in an in-service science and mathematics teacher education course. Teaching in Higher Education, 13(1), 55-67. https://doi.org/10.1080/13562510701794050
  • Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197-210. https://doi.org/10.1177/07481756124402
  • Winke, P., Gass, S., & Myford, C. (2013). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30(2), 231-252. https://doi.org/10.1177/0265532212456968
  • Wolfe, E. W., & McVay, A. (2012). Application of latent trait models to identify raters exhibiting score scale usage problems. Applied Measurement in Education, 25(2), 125–143.
  • Yaz, Ö., & Kurnaz, M. (2017). The Examination of 2013 Science Curricula. International Journal of Turkish Education Science, 2017(8), 173-184.
  • Yenen, E. T. (2021). Prospective teachers’ professional skill needs: A Q method analysis. Teacher Development, 25(2), 196–214. https://doi.org/10.1080/13664530.2021.1877188
  • Yeşilçınar, S., & Şata, M. (2021). Examining Rater Biases of Peer Assessors in Different Assessment Environments. International Journal of Psychology and Educational Studies, 8(4), 136-151. https://izlik.org/JA45TW35KZ
There are 82 citations in total.

Details

Primary Language English
Subjects Item Response Theory
Journal Section Research Article
Authors

Emine Yavuz 0000-0002-1991-1416

Mehmet Şata 0000-0003-2683-4997

Submission Date August 6, 2025
Acceptance Date March 9, 2026
Publication Date April 1, 2026
IZ https://izlik.org/JA55GA56YR
Published in Issue Year 2026 Volume: 17 Issue: 1

Cite

APA Yavuz, E., & Şata, M. (2026). Assessing Pre-Service Teachers’ Competencies in Open-Ended Item Development: Self-, Peer, Instructor Assessment. Journal of Measurement and Evaluation in Education and Psychology, 17(1), 24-41. https://izlik.org/JA55GA56YR