Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting

Yigit Savuran

doi:10.52597/buje.1569563

EN TR

Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting

Öz

Turkish as a second/foreign language (Turkish L2) has gained significant popularity over the last two decades. However, most studies have focused more on the instructional case studies, leaving the assessment related aspects considerably untouched. The current study aims to validate the rating scales of B1 level Turkish L2 writing through argument-based validation. The papers of three adjacent level (A2, B1, and B2) students were scored and compared with each other. Standard setting was employed to provide backing for the validity claim. With acceptable reliability and consistency measures both for the rating and standard setting procedures, the paper utilizes evaluation, generalization, and explanation inferences to build the validity structure. The findings provide empirical backing for these key inferences, lending substantial support to the validity argument for using the developed scales to assess B1-level Turkish L2 writing. Regarding the gap in validation studies in Turkish L2, this paper provides a practical and rigorous guideline for further research and practitioners in the field.

Anahtar Kelimeler

İkinci/Yabancı Dil Olarak Türkçe B1 Seviyesi Yazma Becerisi Değerlendirme Ölçeklerinin Argüman Temelli Geçerlik Yaklaşımı ve Standart Belirleme Yöntemiyle Geçerlenmesi

Öz

İkinci/yabancı dil olarak Türkçe (Türkçe D2) son yirmi yılda önemli ölçüde popülerlik kazanmıştır. Ancak, araştırmaların çoğu daha çok öğretime dayalı durum çalışmalarına odaklanmış, ölçme ve değerlendirme ile ilgili konular ise yeterince çalışılmamıştır. Bu makale, argüman temelli geçerlik yaklaşımı aracılığıyla Türkçe D2 için B1 seviyesine yönelik geliştirilen yazma becerisi değerlendirme ölçeklerini geçerlemeyi amaçlamaktadır. Ardışık üç seviyedeki (A2, B1 ve B2) öğrencilerin yazılı anlatım kağıtları puanlanmış ve birbirleriyle karşılaştırılmıştır. Geçerlik argümanını desteklemek için standart belirleme yöntemi kullanılmıştır. Hem puanlama hem de standart belirleme süreçleri için kabul edilebilir güvenirlik ve tutarlılık ölçümleriyle, bu çalışma geçerlik yapısını oluşturmak için değerlendirme, genelleme ve açıklama çıkarımlarını kullanmaktadır. Bulgular, bu temel çıkarımlar için deneysel destek sağlamakta ve geliştirilen değerlendirme ölçeklerinin B1 seviyesi Türkçe D2 yazma becerisini değerlendirmede kullanılmasına yönelik geçerlik argümanını önemli ölçüde desteklemektedir. Türkçe D2 alanındaki geçerlik çalışmaları boşluğu göz önüne alındığında, bu makale alandaki ileri araştırmalar ve uygulayıcılar için pratik ve kapsamlı bir kılavuz sunmaktadır.

Anahtar Kelimeler

Kaynakça

Alderson, J. C. (2007). The CEFR and the need for more research. The Modern Language Journal, 91(4), 659–663. https://doi.org/10.1111/j.1540-4781.2007.00627_4.x
Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34. https://doi.org/10.1207/s15434311laq0201_1
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford University Press.
Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74. https://doi.org/10.1080/15434300903464418
Barni, M. (2015). In the name of the CEFR. Individuals and standards. In B. Spolsky, O. Inbar–Lourie, & M. Tannenbaum (Eds.), Challenges for language education and policy: Making space for people (pp. 40– 51). Routledge.
Becker, A. (2018). Not to scale? An argument-based inquiry into the validity of an L2 writing rating scale. Assessing Writing, 37, 1–12. https://doi.org/10.1016/j.asw.2018.01.001
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the Test of English as a Foreign Language™. Routledge.
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, (29), 3–13. https://doi.org/10.1111/j.1745-3992.2009.00165.x

Chapelle, C. A., & Voss, E. (Eds.). (2021). Validity argument in language testing: Case studies of validation research. Cambridge University Press.
Choi, Y. (2021). Generalization inference for a computer-mediated graphic-prompt writing test for ESL placement. In C. A. Chapelle & E. Voss (Eds.), Validity argument in language testing: Case studies of validation research (pp. 120–153). Cambridge University Press.
Cizek, G. J., & Bunch, M. B. (2007). Standard setting. SAGE Publications, Inc. https://www.doi.org/10.4135/9781412985918
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Lawrence Erlbaum Associates Publishers.
Council of Europe [CoE]. (2009). Relating language examinations to the common European framework of reference for languages: Learning, teaching, assessment (CEFR). Council of Europe Publishing.
Council of Europe [CoE]. (2020). Common European framework of references for languages – Companion volume. Retrieved from https://rm.coe.int/common-european-framework-of-reference-for-languages-learning-teaching/16809ea0d4
Council of Higher Education in Turkey [YÖK]. (2022, 06 May). Statistics. https://istatistik.yok.gov.tr
Demiralp, M. (2020). Validation of second language proficiency tests through an argument-based approach: The C-Test for Turkish [Unpublished doctoral dissertation]. University of Bristol.
Deygers, B. (2021). The CEFR companion volume: Between research-based policy and policy-based research. Applied Linguistics, 42(1), 186–191. https://doi.org/10.1093/applin/amz024
Deygers, B., & Van Gorp, K. (2015). Determining the scoring validity of a co-constructed CEFR-based rating scale. Language Testing, 32(4), 521–541. https://doi.org/10.1177/0265532215575626
Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). Sage publications.
Fulcher, G. (2004). Deluded by artifices? The Common European framework and harmonization. Language Assessment Quarterly, 1(4), 253–266. https://doi.org/10.1207/s15434311laq0104_4
Harsch, C. (2019). What it means to be at a CEFR level: Or why my Mojito is not your Mojito—On the significance of sharing Mojito recipes. In M. Garant & K. Pöyhönen (Eds.), Developments in language education: A memorial volume in honour of Sauli Takala (pp. 76–93). Finnish Association for Applied Linguistics (AFinLA).
Harsch, C., & Martin, G. (2012). Adapting CEF-descriptors for rating purposes: Validation by a combined rater training and scale revision approach. Assessing Writing, 17(4), 228–250. https://doi.org/10.1016/j.asw.2012.06.003
Harsch, C., & Rupp, A. A. (2011). Designing and scaling level-specific writing tasks in alignment with the CEFR: A test-centered approach. Language Assessment Quarterly, 8(1), 1–33. https://doi.org/10.1080/15434303.2010.535575
Harsch, C., & Seyferth, S. (2020). Marrying achievement with proficiency–Developing and validating a local CEFR-based writing checklist. Assessing Writing, 43, 100433 https://doi.org/10.1016/j.asw.2019.100433
Hulstijn, J. H. (2007). The Shaky Ground beneath the CEFR: Quantitative and qualitative dimensions of language proficiency. The Modern Language Journal, 91(4), 663–667. http://www.jstor.org/stable/4626094
Im, G. H., Shin, D. & Cheng, L. (2019). Critical review of validation models and practices in language testing: Their limitations and future directions for validation research. Language Testing in Asia, 9(1), 14. https://doi.org/10.1186/s40468-019-0089-4
Jun, H. (2021). Justifying the interpretation and use of an ESL writing final examination. In C. A. Chapelle & E. Voss (Eds.), Validity argument in language testing: Case studies of validation research (pp. 235–263). Cambridge University Press.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed.) (pp. 17–64). American Council on Education.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000
Kenyon, D. M. (2012). Using Bachman’s assessment use argument as a tool in conceptualizing the issues surrounding linking ACTFL and CEFR. In E. Tschirner (Ed.), Aligning frameworks of reference in language testing: The ACTFL proficiency guidelines and the Common European Framework of Reference for Languages (pp. 23–34). Tübingen, Germany: Stauffenburg Verlag.
Kenyon, D. M., & Römhild, A. (2013). Standard setting in language testing. In A. J. Kunnan (Ed.), The companion to language assessment (pp. 944–961). Wiley.
Knoch, U., & Chapelle, C. A. (2018). Validation of rating processes within an argument-based framework. Language Testing, 35, 477–499. https://doi.org/10.1177/0265532217710049
Koizumi, R. (2021). The Telephone Standard Speaking Test: An outside evaluator’s investigation of a rebuttal to the generalization inference. In C. A. Chapelle & E. Voss (Eds.), Validity argument in language testing: Case studies of validation research (pp. 154–175). Cambridge University Press.
LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451–475. https://doi.org/10.1177/0265532217713951
Lavery, M. R., Bostic, J. D., Kruse, L., Krupa, E. E., & Carney, M. B. (2020). Argumentation surrounding argument‐based validation: A systematic review of validation methodology in peer‐reviewed articles. Educational Measurement: Issues and Practice, 39(4), 116–130. https://doi.org/10.1111/emip.12378
Liu, S., & Lin, D. (2022). Developing and validating an analytic rating scale for a paraphrase task. Assessing Writing, 53, 100646. https://doi.org/10.1016/j.asw.2022.100646
Lukácsi, Z. (2021). Developing a level-specific checklist for assessing EFL writing. Language Testing, 38(1), 86–105. https://doi.org/10.1177/0265532220916703
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19 (3), 246–276. https://doi.org/10.1191/0265532202lt230oa
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54–71. https://doi.org/10.1177/026553229501200104
Mendoza, A., & Knoch, U. (2018). Examining the validity of an analytic rating scale for a Spanish test for academic purposes using the argument-based approach to validation. Assessing Writing, 35, 41–55. https://doi.org/10.1016/j.asw.2017.12.003
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (pp. 13–103). Macmillan Publishing Co, Inc; American Council on Education.
Milanovic, M., & Weir, C. J. (2010). Series editors’ note. In W. Martyniuk (Ed.), Relating language examinations to the Common European Framework of Reference for Languages: Case studies and reflections on the use of the Council of Europe’s Draft Manual (pp. viii−xx). Cambridge University Press.
Ministry of National Education in Türkiye [MEB] (2020, 06 May). Numbers of Foreign Students in Turkey Retrieved from: https://hbogm.meb.gov.tr/meb_iys_dosyalar/2020_11/18114946_17155955_3_KASIM__2020__YNTERNET_BULTENY_Sunu.pdf
O’Loughlin, K. (2011). The interpretation and use of proficiency test scores in university selection: How valid and ethical are they? Language Assessment Quarterly, 8(2) 146–160. https://doi.org/10.1080/15434303.2011.564698
Papageorgiou, S. (2010). Linking international examinations to the CEFR: The Trinity College London experience. In W. Martyniuk (Ed.), Relating language examinations to the Common European Framework of Reference for Languages: Case studies and reflections on the use of the Council of Europe’s Draft Manual, (pp.145–158). Cambridge University Press.
Papageorgiou, S., & Tannenbaum, R. J. (2016). Situating standard setting within argument-based validity. Language Assessment Quarterly, 13(2), 109–123. https://doi.org/10.1080/15434303.2016.1149857
Papageorgiou, S., Xi, X., Morgan, R., & So, Y. (2015). Developing and validating band levels and descriptors for reporting overall examinee performance. Language Assessment Quarterly, 12(2), 153–177. https://doi.org/10.1080/15434303.2015.1008480
Plake, B. S., & Cizek, G. J. (2012). Variations on a theme: The Modified Angoff, Extended Angoff, and Yes/No standard setting methods. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 181–199). Routledge.
Savuran, Y. (2020) Türkçenin yabancı dil olarak öğretiminde temel ve ara düzey betimleyicilerin geliştirilmesi. [Unpublished doctoral dissertation]. Eskişehir Osmangazi University
Savuran, Y., & Cubukcu, Z. (2021). Yabancı dil olarak Türkçe öğretiminde performans betimleyicileri geliştirme: Temel ve ara düzeyler. Türk Eğitim Bilimleri Dergisi, 19(2), 831–856.
Savuran, Y., & Çubukçu, Z., (2024). Validating tests of Turkish L2 receptive skills: An argument-based validation study through standard setting. Eğitim ve Bilim, 49(220), 239–257.
Sawaki, Y., & Sinharay, S. (2017). Do the TOEFL iBT® section scores provide value-added information to stakeholders? Language Testing, 35(4), 529–556. https://doi.org/10.1177/0265532217716731
Shin, S.-Y., & Lidster, R. (2017). Evaluating different standard-setting methods in an ESL placement testing context. Language Testing, 34(3), 357–381. https://doi.org/10.1177/0265532216646605
Staples, S., Laflair, G. T., & Egbert, J. (2017). Comparing language use in oral proficiency interviews to target domains: Conversational, academic, and professional discourse. The Modern Language Journal, 101(1), 194–213. https://doi.org/10.1111/modl.12385
Tannenbaum, R. J., & Cho, Y. (2014). Critical factors to consider in evaluating standard-setting studies to map language test scores to frameworks of language proficiency. Language Assessment Quarterly, 11(3), 233–249. https://doi.org/10.1080/15434303.2013.869815
Turkish Maarif Foundation (Türkiye Maarif Vakfı) [TMV] (2022, May, 15). “Turkish Maarif Foundation in The World”. Retrieved from https://turkiyemaarif.org/page/42-dunyada-tmv-16
Toulmin, S. (1958). The uses of argument. Cambridge: Cambridge University Press.
Toulmin, S. (2003). The uses of argument (Updated ed.). Cambridge University Press.
Türkiye Scholarships (2022, May 25). Turkish Language Education. Retrieved from: https://www.turkiyeburslari.gov.tr/en/page/current-students/turkish-language-education
United Nations High Commissioner for Refugees [UNHCR] (2021, September, 22). Refugees and asylum seekers in Turkey. Retrieved on: https://www.unhcr.org/tr/en/refugees-and-asylum-seekers-in-turkey
Üçpınar, F. K., & Ünaldı, A. (2018). Validation of writing scales for Turkish as a second language through many-facet Rasch measurement. Bogazici University Journal of Education, 34(1), 23–48.
Wang, Y., & Xie, Q. (2022). Diagnosing EFL undergraduates’ discourse competence in academic writing. Assessing Writing, 53, 100641. https://doi.org/10.1016/j.asw.2022.100641
Weigle, S. C. (2002). Assessing writing. Cambridge University Press.
Winke, P., & Lim, H. (2015). ESL essay raters’ cognitive processes in applying the Jacobs et al. rubric: An eye-movement study. Assessing Writing, 25, 38–54. https://doi.org/10.1016/j.asw.2015.05.002
Yang, H. (2021). Support for the evaluation inference: Investigating conditions for rating responses on a test of academic oral language. In C. A. Chapelle & E. Voss (Eds.), Validity argument in language testing: Case studies of validation research (pp. 96–119). Cambridge University Press
Yunus Emre Institute (Yunus Emre Enstitüsü) [YEE] (2022, May 15). Yunus Emre Enstitüsü – Kurumsal. Retrieved from. https://www.yee.org.tr/tr/kurumsal/yunus-emre-enstitusu

Ayrıntılar

Birincil Dil

İngilizce

Konular

Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları, Ölçek Geliştirme, Standart Belirleme ve Normlar, Eğitimde Program Geliştirme

Bölüm

Araştırma Makalesi

Yazarlar

Yigit Savuran ^*
0000-0003-3009-8005
Türkiye

Yayımlanma Tarihi

31 Aralık 2025

Gönderilme Tarihi

18 Ekim 2024

Kabul Tarihi

14 Mayıs 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 42 Sayı: 3

DOI

https://doi.org/10.52597/buje.1569563

IZ

https://izlik.org/JA44TX47YK

Kaynak Göster

RIS / Bibtex

APA

Savuran, Y. (2025). Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting. Bogazici University Journal of Education, 42(3), 1-18. https://doi.org/10.52597/buje.1569563

AMA

1.Savuran Y. Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting. BUJE. 2025;42(3):1-18. doi:10.52597/buje.1569563

Chicago

Savuran, Yigit. 2025. “Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting”. Bogazici University Journal of Education 42 (3): 1-18. https://doi.org/10.52597/buje.1569563.

EndNote

Savuran Y (01 Aralık 2025) Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting. Bogazici University Journal of Education 42 3 1–18.

IEEE

[1]Y. Savuran, “Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting”, BUJE, c. 42, sy 3, ss. 1–18, Ara. 2025, doi: 10.52597/buje.1569563.

ISNAD

Savuran, Yigit. “Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting”. Bogazici University Journal of Education 42/3 (01 Aralık 2025): 1-18. https://doi.org/10.52597/buje.1569563.

JAMA

1.Savuran Y. Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting. BUJE. 2025;42:1–18.

MLA

Savuran, Yigit. “Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting”. Bogazici University Journal of Education, c. 42, sy 3, Aralık 2025, ss. 1-18, doi:10.52597/buje.1569563.

Vancouver

1.Yigit Savuran. Investigating the Validity of Rating Scales for B1 Level Turkish L2 Writing Assessment through Argument-based validation and Standard Setting. BUJE. 01 Aralık 2025;42(3):1-18. doi:10.52597/buje.1569563