Using Rasch analysis to examine raters’ expertise Turkish teacher candidates’ competency levels in writing different types of test items

Ayfer Sayın; Mehmet Şata

doi:10.21449/ijate.1058300

Research Article

Using Rasch analysis to examine raters’ expertise Turkish teacher candidates’ competency levels in writing different types of test items

Year 2022, Volume: 9 Issue: 4, 998 - 1012, 22.12.2022

Ayfer Sayın , Mehmet Şata

https://doi.org/10.21449/ijate.1058300

Cited By: 1

https://izlik.org/JA44BG42CG

Abstract

The aim of the present study was to examine Turkish teacher candidates’ competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates participated in the present study, which was conducted using the relational survey model, one of the quantitative research methods. Three experts participated in the rating process: an expert in Turkish education, an expert in measurement and evaluation, and an expert in both Turkish education and measurement and evaluation. The teacher candidates wrote true-false, short response, multiple choice and open-ended types of items in accordance with the Test Item Development Form, and the raters scored each item type by designating a score between 1 and 5 based on the item evaluation scoring rubric prepared for each item type. The study revealed that Turkish teacher candidates had the highest level of competency in writing true-false items, while they had the lowest competency in writing multiple-choice items. Moreover, it was revealed that raters’ expertise had an effect on teacher candidates’ competencies in writing different types of items. Finally, it was found that the rater who was an expert in both Turkish education and measurement and evaluation had the highest level of scoring reliability, while the rater who solely had expertise in measurement and evaluation had the relatively lowest level of scoring reliability.

Keywords

Test item , Raters’ expertise , Many Facet Rasch , Validity , Reliability

References

Anthony, C.J., Styck, K.M., Volpe, R.J., & Robert, C.R. (2022). Using many-facet rasch measurement and generalizability theory to explore rater effects for direct behavior rating–multi-item scales. School Psychology. Advance online publication. https://doi.org/10.1037/spq0000518
Asim, A.E., Ekuri, E.E., & Eni, E.I. (2013). A Diagnostic Study of Pre-Service Teachers’ Competency in Multiple-Choice Item Development. Research in Education, 89(1), 13–22. https://doi.org/10.7227/RIE.89.1.2
Atılgan, H., & Tezbaşaran, A. (2005). Genellenebilirlik kuramı alternatif karar çalışmaları ile senaryolar ve gerçek durumlar için elde edilen g ve phi katsayılarının tutarlılığının incelenmesi. Eğitim Araştırmaları, 18(1), 28-40.
Barkaoui, K. (2010). Do ESL essay raters’ evaluation criteria change with experience? A mixed-methods, cross-sectional study. TESOL Quarterly, 44(1), 31–57.
Baykul, Y. (2000). Eğitimde ve psikolojide ölçme. ÖSYM Yayınları.
Bıkmaz Bilgen, Ö., & Doğan, N. (2017). Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63-78. https://doi.org/10.21031/epod.294847
Büyüköztürk, Ş., Kılıç Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2018). Eğitimde bilimsel araştırma yöntemleri. Pegem Akademi. https://doi.org/10.14527/9789944919289
Crocker, L.M. & Algina, L. (2008). Introduction to classical and modern test theory. Holt, Rinehart and Winston.
Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117–135.
Erguvan, I.D. & Aksu Dünya, B. (2021). Gathering evidence on e-rubrics: Perspectives and many facet Rasch analysis of rating behavior. International Journal of Assessment Tools in Education , 8(2) , 454-474 . https://doi.org/10.21449/ijate.818151
Erman Aslanoğlu, A., & Şata, M. (2021). Examining the differential rater functioning in the process of assessing writing skills of middle school 7th grade students. Participatory Educational Research (PER), 8(4), 239-252. https://doi.org/10.17275/per.21.88.8.4
Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101. https://doi.org/10.37546/JALTJJ34.1-3
Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83. https://doi.org/10.4304/tpls.1.11.1531-1540
Fuhrman, M. (1996) Developing Good Multiple-Choice Tests and Test Items, Journal of Geoscience Education, 44(4), 379-384. https://doi.org/10.5408/1089-9995-44.4.379
Gierl, M.J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review. Review of Educational Research, 87(6), 1082-1116. https://doi.org/10.3102/0034654317726529
Goodwin, S. (2016). A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes. Assessing Writing, 30(1), 21-31. https://doi.org/10.1016/j.asw.2016.07.004
Gorin, J.S. (2007). Reconsidering issues in validity theory. Educational Researcher, 36(8), 456-462. https://doi.org/10.3102/0013189X07311607
Haladyna, T.M., Downing, S.M., & Rodriguez, M.C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment, Applied Measurement in Education, 15(3), 309-333. https://doi.org/10.1207/S15324818AME1503_5
Jones, E., & Bergin, C. (2019) Evaluating Teacher Effectiveness Using Classroom Observations: A Rasch Analysis of the Rater Effects of Principals, Educational Assessment, 24(2), 91-118. https://doi.org/10.1080/10627197.2018.1564272
Kamış, Ö. & Doğan, C.D. (2017). How consistent are decision studies in G theory?. Gazi University Journal of Gazi Educational Faculty, 37(2), 591-610.
Kara, Y., & Kelecioğlu, H. (2015). Puanlayıcı Niteliklerinin Kesme Puanlarının Belirlenmesine Etkisinin Genellenebilirlik Kuramı’yla İncelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6(1), 58-71. https://doi.org/10.21031/epod.47997
Karasar, N. (2018). Bilimsel araştırma yöntemi (33th ed.). Ankara: Nobel Yayıncılık.
Kim, H. (2020). Kim, H. Effects of rating criteria order on the halo effect in L2 writing assessment: a many-facet Rasch measurement analysis. Lang Test Asia 10(16), 1-23, https://doi.org/10.1186/s40468-020-00115-0
Leckie, G., & Baird, J.A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
Li, W. (2022). Scoring rubric reliability and internal validity in rater-mediated EFL writing assessment: Insights from many-facet Rasch measurement. Read Writ. https://doi.org/10.1007/s11145-022-10279-1
Linacre, J.M. (1993). Rasch-based generalizability theory. Rasch Measurement Transaction, 7(1), 283-284.
Linacre, J.M. (2012). FACETS (Version 3.70.1) [Computer Software]. MESA Press.
Linacre, J.M. (2017). FACETS (Version 3.80.0) [Computer Software]. MESA Press.
Linn, R.L., & Grolund, N.E. (2000). Measurement and assessment in teaching (8th ed.). Merrill/Prentice Hall.
Marais, I., & Andrich, D. (2008). Formalizing dimension and response violations of local independence in the unidimensional Rasch model. J Appl Meas, 9(3), 200-215.
McDonald, R.P. (1999). Test theory: A unified approach. Lawrence Erlbaum.
Meadows, M., & Billington, L. (2010). The effect of marker background and training on the quality of marking in GCSE English. AQA Education.
Milli Eğitim Bakanlığı (2019). Türkçe Dersi Öğretim Programı (İlkokul ve Ortaokul 1, 2, 3, 4, 5, 6, 7 ve 8. Sınıflar). MEB Yayınları.
Myford, C.M., & Wolfe, E.W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Osburn, H.G. (2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological methods, 5(3), 343-355.
Özçelik, D.A. (2010a). Ölçme ve değerlendirme. Pegem Akademi.
Özçelik, D.A. (2010b). Test geliştirme kılavuzu. Pegem Akademi.
Primi, R., Silvia, P.J., Jauk, E., & Benedek, M. (2019). Applying many-facet Rasch modeling in the assessment of creativity. Psychology of Aesthetics, Creativity, and the Arts, 13(2), 176–186. https://doi.org/10.1037/aca0000230
Sayın, A., & Kahraman, N. (2020). A measurement tool for repeated measurement of assessment of university students’ writing skill: development and evaluation. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 113-130. https://doi.org/10.21031/epod.639148
Sayın, A., & Takıl, N.B. (2017). Opinions of the Turkish teacher candidates for change in the reading skills of the students in the 15 year old group. International Journal of Language Academy, 5(2), 266-284. http://dx.doi.org/10.18033/ijla.3561
Sireci, S.G. (2007). On validity theory and test validition. Educational Researcher, 36(8), 477-481. https://doi.org/10.3102/0013189X07311609
Song, T., Wolfe, E.W., Hahn, L., Less-Petersen, M., Sanders, R., & Vickers, D. (2014). Relationship between rater background and rater performance. Pearson.
Tan, Ş. (2012). Öğretimde ölçme ve değerlendirme KPSS el kitabı. Ankara: Pegem Akademi.
Tekin, H. (2004). Eğitimde ölçme ve değerlendirme. Yargı Yayınevi.
Walsh, W.B., & Betz, N.E. (1995). Tests and assessment. Prentice-Hall, Inc.
Wiseman, C.S. (2012). Rater effects: Ego engagement in rater decision-making. Assessing Writing, 17(3), 150-173. https://doi.org/10.1016/j.asw.2011.12.001

Using Rasch analysis to examine raters’ expertise Turkish teacher candidates’ competency levels in writing different types of test items

Year 2022, Volume: 9 Issue: 4, 998 - 1012, 22.12.2022

Ayfer Sayın , Mehmet Şata

https://doi.org/10.21449/ijate.1058300

Cited By: 1

https://izlik.org/JA44BG42CG

Abstract

Keywords

Test item , Raters’ expertise , Many Facet Rasch , Validity , Reliability

References

Anthony, C.J., Styck, K.M., Volpe, R.J., & Robert, C.R. (2022). Using many-facet rasch measurement and generalizability theory to explore rater effects for direct behavior rating–multi-item scales. School Psychology. Advance online publication. https://doi.org/10.1037/spq0000518
Asim, A.E., Ekuri, E.E., & Eni, E.I. (2013). A Diagnostic Study of Pre-Service Teachers’ Competency in Multiple-Choice Item Development. Research in Education, 89(1), 13–22. https://doi.org/10.7227/RIE.89.1.2
Atılgan, H., & Tezbaşaran, A. (2005). Genellenebilirlik kuramı alternatif karar çalışmaları ile senaryolar ve gerçek durumlar için elde edilen g ve phi katsayılarının tutarlılığının incelenmesi. Eğitim Araştırmaları, 18(1), 28-40.
Barkaoui, K. (2010). Do ESL essay raters’ evaluation criteria change with experience? A mixed-methods, cross-sectional study. TESOL Quarterly, 44(1), 31–57.
Baykul, Y. (2000). Eğitimde ve psikolojide ölçme. ÖSYM Yayınları.
Bıkmaz Bilgen, Ö., & Doğan, N. (2017). Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63-78. https://doi.org/10.21031/epod.294847
Büyüköztürk, Ş., Kılıç Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2018). Eğitimde bilimsel araştırma yöntemleri. Pegem Akademi. https://doi.org/10.14527/9789944919289
Crocker, L.M. & Algina, L. (2008). Introduction to classical and modern test theory. Holt, Rinehart and Winston.
Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117–135.
Erguvan, I.D. & Aksu Dünya, B. (2021). Gathering evidence on e-rubrics: Perspectives and many facet Rasch analysis of rating behavior. International Journal of Assessment Tools in Education , 8(2) , 454-474 . https://doi.org/10.21449/ijate.818151
Erman Aslanoğlu, A., & Şata, M. (2021). Examining the differential rater functioning in the process of assessing writing skills of middle school 7th grade students. Participatory Educational Research (PER), 8(4), 239-252. https://doi.org/10.17275/per.21.88.8.4
Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101. https://doi.org/10.37546/JALTJJ34.1-3
Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83. https://doi.org/10.4304/tpls.1.11.1531-1540
Fuhrman, M. (1996) Developing Good Multiple-Choice Tests and Test Items, Journal of Geoscience Education, 44(4), 379-384. https://doi.org/10.5408/1089-9995-44.4.379
Gierl, M.J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review. Review of Educational Research, 87(6), 1082-1116. https://doi.org/10.3102/0034654317726529
Goodwin, S. (2016). A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes. Assessing Writing, 30(1), 21-31. https://doi.org/10.1016/j.asw.2016.07.004
Gorin, J.S. (2007). Reconsidering issues in validity theory. Educational Researcher, 36(8), 456-462. https://doi.org/10.3102/0013189X07311607
Haladyna, T.M., Downing, S.M., & Rodriguez, M.C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment, Applied Measurement in Education, 15(3), 309-333. https://doi.org/10.1207/S15324818AME1503_5
Jones, E., & Bergin, C. (2019) Evaluating Teacher Effectiveness Using Classroom Observations: A Rasch Analysis of the Rater Effects of Principals, Educational Assessment, 24(2), 91-118. https://doi.org/10.1080/10627197.2018.1564272
Kamış, Ö. & Doğan, C.D. (2017). How consistent are decision studies in G theory?. Gazi University Journal of Gazi Educational Faculty, 37(2), 591-610.
Kara, Y., & Kelecioğlu, H. (2015). Puanlayıcı Niteliklerinin Kesme Puanlarının Belirlenmesine Etkisinin Genellenebilirlik Kuramı’yla İncelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6(1), 58-71. https://doi.org/10.21031/epod.47997
Karasar, N. (2018). Bilimsel araştırma yöntemi (33th ed.). Ankara: Nobel Yayıncılık.
Kim, H. (2020). Kim, H. Effects of rating criteria order on the halo effect in L2 writing assessment: a many-facet Rasch measurement analysis. Lang Test Asia 10(16), 1-23, https://doi.org/10.1186/s40468-020-00115-0
Leckie, G., & Baird, J.A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
Li, W. (2022). Scoring rubric reliability and internal validity in rater-mediated EFL writing assessment: Insights from many-facet Rasch measurement. Read Writ. https://doi.org/10.1007/s11145-022-10279-1
Linacre, J.M. (1993). Rasch-based generalizability theory. Rasch Measurement Transaction, 7(1), 283-284.
Linacre, J.M. (2012). FACETS (Version 3.70.1) [Computer Software]. MESA Press.
Linacre, J.M. (2017). FACETS (Version 3.80.0) [Computer Software]. MESA Press.
Linn, R.L., & Grolund, N.E. (2000). Measurement and assessment in teaching (8th ed.). Merrill/Prentice Hall.
Marais, I., & Andrich, D. (2008). Formalizing dimension and response violations of local independence in the unidimensional Rasch model. J Appl Meas, 9(3), 200-215.
McDonald, R.P. (1999). Test theory: A unified approach. Lawrence Erlbaum.
Meadows, M., & Billington, L. (2010). The effect of marker background and training on the quality of marking in GCSE English. AQA Education.
Milli Eğitim Bakanlığı (2019). Türkçe Dersi Öğretim Programı (İlkokul ve Ortaokul 1, 2, 3, 4, 5, 6, 7 ve 8. Sınıflar). MEB Yayınları.
Myford, C.M., & Wolfe, E.W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
Osburn, H.G. (2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological methods, 5(3), 343-355.
Özçelik, D.A. (2010a). Ölçme ve değerlendirme. Pegem Akademi.
Özçelik, D.A. (2010b). Test geliştirme kılavuzu. Pegem Akademi.
Primi, R., Silvia, P.J., Jauk, E., & Benedek, M. (2019). Applying many-facet Rasch modeling in the assessment of creativity. Psychology of Aesthetics, Creativity, and the Arts, 13(2), 176–186. https://doi.org/10.1037/aca0000230
Sayın, A., & Kahraman, N. (2020). A measurement tool for repeated measurement of assessment of university students’ writing skill: development and evaluation. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 113-130. https://doi.org/10.21031/epod.639148
Sayın, A., & Takıl, N.B. (2017). Opinions of the Turkish teacher candidates for change in the reading skills of the students in the 15 year old group. International Journal of Language Academy, 5(2), 266-284. http://dx.doi.org/10.18033/ijla.3561
Sireci, S.G. (2007). On validity theory and test validition. Educational Researcher, 36(8), 477-481. https://doi.org/10.3102/0013189X07311609
Song, T., Wolfe, E.W., Hahn, L., Less-Petersen, M., Sanders, R., & Vickers, D. (2014). Relationship between rater background and rater performance. Pearson.
Tan, Ş. (2012). Öğretimde ölçme ve değerlendirme KPSS el kitabı. Ankara: Pegem Akademi.
Tekin, H. (2004). Eğitimde ölçme ve değerlendirme. Yargı Yayınevi.
Walsh, W.B., & Betz, N.E. (1995). Tests and assessment. Prentice-Hall, Inc.
Wiseman, C.S. (2012). Rater effects: Ego engagement in rater decision-making. Assessing Writing, 17(3), 150-173. https://doi.org/10.1016/j.asw.2011.12.001

There are 47 citations in total.

Details

Primary Language	English
Subjects	Other Fields of Education
Journal Section	Research Article
Authors	Ayfer Sayın 0000-0003-1357-5674 Mehmet Şata 0000-0003-2683-4997
Submission Date	January 15, 2022
Publication Date	December 22, 2022
DOI	https://doi.org/10.21449/ijate.1058300
IZ	https://izlik.org/JA44BG42CG
Published in Issue	Year 2022 Volume: 9 Issue: 4

Cite

APA	Sayın, A., & Şata, M. (2022). Using Rasch analysis to examine raters’ expertise Turkish teacher candidates’ competency levels in writing different types of test items. International Journal of Assessment Tools in Education, 9(4), 998-1012. https://doi.org/10.21449/ijate.1058300

Cited By

Yazılı Üretimlerin Analitik Değerlendirilmesi: Ölçüt Ayrımının Çok Yüzeyli Rasch Analiziyle İncelenmesi

Korkut Ata Türkiyat Araştırmaları Dergisi

https://doi.org/10.51531/korkutataturkiyat.1403091

Article Files

Full Text

23823 23825 23824