Araştırma Makalesi
BibTex RIS Kaynak Göster

Yapısal Konu Modellemesi Yoluyla Eğitimde Ölçme Alanındaki Eğilimler ve İçgörüler: Dil Değerlendirmesi Üzerine Bir İnceleme

Yıl 2026, Cilt: 27 Sayı: 1, 290 - 317, 31.01.2026
https://doi.org/10.29299/kefad.1732570

Öz

Bu araştırmada, eğitimde ölçme alanındaki tematik eğilimleri ve araştırma yönelimlerini ortaya koymak amacıyla Yapısal Konu Modellemesi (STM) kullanılmıştır. Bu doğrultuda, örnek bir alt alan uygulaması olarak Language Testing ve Language Assessment Quarterly dergilerinde son 16 yılda yayımlanan toplam 778 makale analiz edilmiştir. STM analizi, en belirgin konuların “Dil Testinin Sosyal, Politik ve Etik Boyutları”, “Dil Değerlendirme Okuryazarlığının Geliştirilmesi” ve “Okuma ve Dinleme Değerlendirmelerinde Psikometrik Yaklaşımlar” olduğu on farklı tema ortaya koymuştur. Çalışmada ayrıca, değerlendirici güvenirliğine ilişkin kritik sorunlar vurgulanmakta ve bu konunun dil değerlendirme araştırmalarındaki merkezi rolüne dikkat çekilmektedir. Ayrıca, işaret dili ve iki dillilik bağlamlarında özellikle sözcük bilgisinin dil yeterliğindeki rolüne ilişkin iki birbiriyle bağlantılı tema öne çıkmaktadır. Dil testinin sosyal, politik ve etik boyutlarına artan vurgu, bu alanın yalnızca yeterlilik ölçümünü aşarak eğitim politikalarını ve uygulamalarını şekillendirme gücünü göstermektedir. Psikometrik yöntemlerin ve dil değerlendirme okuryazarlığının öne çıkması ise alandaki süregelen kuramsal ve yöntemsel gelişmelere işaret etmektedir. Bu bulgular, dil değerlendirme araştırmalarındaki önceliklerin ve yönelimlerin nasıl değiştiğine ilişkin araştırmacılar, politika yapıcılar ve uygulayıcılar için önemli içgörüler sunmaktadır.

Kaynakça

  • Aryadoust, V., Eckes, T., & In’nami, Y. (2021). Editorial: Frontiers in Language Assessment and Testing. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.691614
  • Aryadoust, V., Goh, C. C. M., & Kim, L. O. (2011). An investigation of differential item functioning in the MELAB listening Test. Language Assessment Quarterly, 8(4), 361–385. https://doi.org/10.1080/15434303.2011.628632 Aryadoust, V., Zakaria, A., Lim, M. H., & Chen, C. (2020). An extensive knowledge mapping review of measurement and validity in language assessment and SLA research. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.01941
  • Bachman, L. F., & Clark, J. L. D. (1987). The measurement of Foreign/Second Language Proficiency. The Annals of the American Academy of Political and Social Science, 490(1), 20–33. https://doi.org/10.1177/0002716287490001003
  • Bae, J., Bentler, P. M., & Lee, Y. (2016). On the role of content in writing assessment. Language Assessment Quarterly, 13(4), 302–328. https://doi.org/10.1080/15434303.2016.1246552
  • Baker, B. A., & Riches, C. (2017). The development of EFL examinations in Haiti: Collaboration and language assessment literacy development. Language Testing, 35(4), 557–581. https://doi.org/10.1177/0265532217716732
  • Banks, G. C., Woznyj, H. M., Wesslen, R. S., & Ross, R. L. (2018). A review of best practice recommendations for text analysis in R (and a User-Friendly app). Journal of Business and Psychology, 33(4), 445–459. https://doi.org/10.1007/s10869-017-9528-3
  • Barkaoui, K. (2010a). Explaining ESL essay holistic scores: A multilevel modeling approach. Language Testing, 27(4), 515-535. https://doi.org/10.1177/0265532210368717
  • Barkaoui, K. (2010b). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28(1), 51–75. https://doi.org/10.1177/0265532210376379
  • Barkaoui, K. (2010c). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74. https://doi.org/10.1080/15434300903464418
  • Barkaoui, K. (2024). The Academic Achievement of Undergraduate Students with Different English Language Proficiency Profiles. Language Assessment Quarterly, 21(3), 224–244. https://doi.org/10.1080/15434303.2024.2346089
  • Barkaoui, K. (2025). The relationship between English language proficiency test scores and academic achievement: A longitudinal study of two tests. Language Testing, 0(0). https://doi.org/10.1177/02655322251319284
  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. https://doi.org/10.5555/944919.944937
  • Bochner, J. H., Samar, V. J., Hauser, P. C., Garrison, W. M., Searls, J. M., & Sanders, C. A. (2015). Validity of the American Sign Language Discrimination Test. Language Testing, 33(4), 473–495. https://doi.org/10.1177/0265532215590849
  • Carlsen, C. H., & Rocca, L. (2021). Language test misuse. Language Assessment Quarterly, 18(5), 477–491. https://doi.org/10.1080/15434303.2021.1947288
  • Cho, Y., & Bridgeman, B. (2012). Relationship of TOEFL iBT® scores to academic performance: Some evidence from American universities. Language Testing, 29(3), 421–442. https://doi.org/10.1177/0265532211430368
  • Choi, H., & Woo, J. (2022). Investigating emerging hydrogen technology topics and comparing national level technological focus: Patent analysis using a structural topic model. Applied Energy, 313, 118898.https://doi.org/10.1016/j.apenergy.2022.118898
  • Coghlan, S., Miller, T., & Paterson, J. (2021). Good proctor or “big brother”? Ethics of online exam supervision technologies. Philosophy & Technology, 34(4), 1581–1606. https://doi.org/10.1007/s13347-021-00476-1
  • Eckes, T. (2012). Operational Rater types in writing assessment: linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270–292. https://doi.org/10.1080/15434303.2011.64938
  • Elder, C., & McNamara, T. (2015). The hunt for “indigenous criteria” in assessing communication in the physiotherapy workplace. Language Testing, 33(2), 153–174. https://doi.org/10.1177/0265532215607398
  • Fan, J., & Yan, X. (2020). Assessing Speaking Proficiency: A narrative review of speaking assessment research within the Argument-Based Validation Framework. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.00330
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education. McGrawhill.
  • Gamaroff, R. (2000). Rater reliability in language assessment: The bug of all bears. System, 28(1), 31–53. https://doi.org/10.1016/S0346-251X(99)00059-7
  • Gardner, R. C., & MacIntyre, P. D. (1992). A student’s contributions to second language learning. Part I: Cognitive variables. Language Teaching, 25(4), 211–220. https://doi.org/10.1017/S026144480000700X
  • Gokturk, N., & Chukharev, E. (2024). Exploring the potential of a spoken Dialog System-Delivered Paired Discussion task for assessing interactional competence. Language Assessment Quarterly, 21(1), 60–99. https://doi.org/10.1080/15434303.2023.2289173
  • Hamdani, S., Chan, A., Kan, R., Chiat, S., Gagarina, N., Haman, E., … Armon-Lotem, S. (2024). Identifying developmental language disorder (DLD) in multilingual children: A case study tutorial. International Journal of Speech-Language Pathology, 1–15. https://doi.org/10.1080/17549507.2024.2326095
  • Hauck, M. C., Wolf, M. K., & Mislevy, R. (2016). Creating a Next-Generation system of K-12 English learner language proficiency assessments. ETS Research Report Series, 2016(1), 1–10. https://doi.org/10.1002/ets2.12092
  • Huang, F. L., & Konold, T. R. (2013). A latent variable investigation of the Phonological Awareness Literacy Screening-Kindergarten assessment: Construct identification and multigroup comparisons between Spanish-speaking English-language learners (ELLs) and non-ELL students. Language Testing, 31(2), 205–221. https://doi.org/10.1177/0265532213496773
  • Isaacs, T., Hu, R., Trenkic, D., & Varga, J. (2023). Examining the predictive validity of the Duolingo English Test: Evidence from a major UK university. Language Testing, 40(3), 748–770. https://doi.org/10.1177/02655322231158550
  • Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135–159. https://doi.org/10.1080/15434303.2013.769545
  • Isbell, D. R., Kremmel, B., & Kim, J. (2023). Remote proctoring in Language Testing: Implications for fairness and justice. Language Assessment Quarterly, 20(4–5), 469–487. https://doi.org/10.1080/15434303.2023.2288251
  • Jang, E. E., Cummins, J., Wagner, M., Stille, S., & Dunlop, M. (2015). Investigating the homogeneity and distinguishability of STEP proficiency descriptors in assessing English language learners in Ontario schools. Language Assessment Quarterly, 12(1), 87–109. https://doi.org/10.1080/15434303.2014.936602
  • Javidanmehr, Z., & Sarab, M. R. A. (2019). Retrofitting non-diagnostic reading comprehension assessment: Application of the G-DINA model to a high-stake reading comprehension test. Language Assessment Quarterly, 16(3), 294–311. https://doi.org/10.1080/15434303.2019.1654479
  • Kessler, G. (2018). Technology and the future of language teaching. Foreign Language Annals, 51(1), 205–218.
  • Kokhan, K. (2012). Investigating the possibility of using TOEFL scores for university ESL decision-making: Placement trends and effect of time lag. Language Testing, 29(2), 291–308. https://doi.org/10.1177/0265532211429403
  • Kotowicz, J., Woll, B., & Herman, R. (2020). Adaptation of the British Sign Language Receptive Skills Test into Polish Sign Language. Language Testing, 38(1), 132–153. https://doi.org/10.1177/0265532220924598
  • Kozaki, Y. (2010). An alternative decision-making procedure for performance assessments: Using the multifaceted Rasch model to generate cut estimates. Language Assessment Quarterly, 7(1), 75–95. https://doi.org/10.1080/15434300903464400
  • Kremmel, B., & Schmitt, N. (2016). Interpreting vocabulary test scores: What do various item formats tell us about learners’ ability to employ words? Language Assessment Quarterly, 13(4), 377–392. https://doi.org/10.1080/15434303.2016.1237516
  • Kuhn, K. D. (2018). Using structural topic modeling to identify latent topics and trends in aviation incident reports. Transportation Research Part C Emerging Technologies, 87, 105–122. https://doi.org/10.1016/j.trc.2017.12.018
  • Kunnan, A. J. (2009). Testing for citizenship: The U.S. naturalization test. Language Assessment Quarterly, 6(1), 89–97. https://doi.org/10.1080/15434300802606630
  • Kyle, K., & Crossley, S. (2017). Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing, 34(4), 513–535. https://doi.org/10.1177/0265532217712554
  • Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly, 18(2), 154–170. https://doi.org/10.1080/15434303.2020.1844205
  • Lam, R. (2014). Language assessment training in Hong Kong: Implications for language assessment literacy. Language Testing, 32(2), 169–197. https://doi.org/10.1177/0265532214554321
  • Lam, D. M. K. (2019). Interactional Competence with and without Extended Planning Time in a Group Oral Assessment. Language Assessment Quarterly, 16(1), 1–20. https://doi.org/10.1080/15434303.2019.1602627
  • Laufer, B., & McLean, S. (2016). Loanwords and vocabulary size test scores: A case of different estimates for different L1 learners. Language Assessment Quarterly, 13(3), 202–217. https://doi.org/10.1080/15434303.2016.1210611
  • Li, X., Dai, A., Tran, R., & Wang, J. (2023). Text mining-based identification of promising miRNA biomarkers for diabetes mellitus. Frontiers in Endocrinology, 14. https://doi.org/10.3389/fendo.2023.1195145
  • Liu, H. Y., You, X. F., Wang, W. Y., Ding, S. L., & Chang, H. H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30(2), 152-172. https://doi.org/10.1007/s00357-013-9128-5
  • Liu, T., Aryadoust, V., & Foo, S. (2021). Examining the factor structure and its replicability across multiple listening test forms: Validity evidence for the Michigan English Test. Language Testing, 39(1), 142–171. https://doi.org/10.1177/02655322211018139
  • Manias, E., & McNamara, T. (2016). Standard setting in specific-purpose language testing: What can a qualitative study add? Language Testing, 33(2), 235–249. https://doi.org/10.1177/0265532215608411
  • May, L. (2011). Interactional competence in a paired speaking test: Features salient to raters. Language Assessment Quarterly, 8(2), 127–145. https://doi.org/10.1080/15434303.2011.565845
  • McNamara, T. (2009). Australia: The dictation tests redux? Language Assessment Quarterly, 6(1), 106–111. https://doi.org/10.1080/15434300802606663
  • McNamara, T., & Ryan, K. (2011). Fairness versus justice in language testing: The place of English literacy in the Australian citizenship Test. Language Assessment Quarterly, 8(2), 161–178. https://doi.org/10.1080/15434303.2011.565438
  • Min, S., & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31(4), 453–477. https://doi.org/10.1177/0265532214527277
  • Min, S., Cai, H., & He, L. (2021). Application of bi-factor MIRT and higher-order CDM models to an in-house EFL listening test for diagnostic purposes. Language Assessment Quarterly, 19(2), 189–213. https://doi.org/10.1080/15434303.2021.1980571
  • Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
  • O’Hagan, S., Pill, J., & Zhang, Y. (2015). Extending the scope of speaking assessment criteria in a specific-purpose language test: Operationalizing a health professional perspective. Language Testing, 33(2), 195–216. https://doi.org/10.1177/0265532215607920
  • Olson, D. J. (2023). Measuring bilingual language dominance: An examination of the reliability of the Bilingual Language Profile. Language Testing, 40(3), 521–547. https://doi.org/10.1177/02655322221139162
  • Peña, E. D., Bedore, L. M., Lugo-Neris, M. J., & Albudoor, N. (2020). Identifying developmental language disorder in school-age bilinguals: Semantics, grammar, and narratives. Language Assessment Quarterly, 17(5), 541–558. https://doi.org/10.1080/15434303.2020.1827258
  • Plough, I. C., & Bogart, P. S. H. (2008). Perceptions of examiner behavior modulate power relations in oral performance testing. Language Assessment Quarterly, 5(3), 195–217. https://doi.org/10.1080/15434300802229375
  • Pill, J. (2015). Drawing on indigenous criteria for more authentic assessment in a specific-purpose language test: Health professionals interacting with patients. Language Testing, 33(2), 175–193. https://doi.org/10.1177/0265532215607400
  • Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder‐Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for Open‐Ended Survey Responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103
  • Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). stm: An R package for structural topic models. Journal of Statistical Software, 91(2). https://doi.org/10.18637/jss.v091.i02
  • Robles-García, P., McLean, S., Stewart, J., Shin, J. young, & Sánchez-Gutiérrez, C. H. (2024). The development and initial validation of O-WSVLT, a meaning-recall online L2 Spanish vocabulary levels test. Language Assessment Quarterly, 21(2), 181–205. https://doi.org/10.1080/15434303.2024.2311724
  • Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role of interpretation in assessment and in teacher learning. Language Testing, 30(3), 309–327. https://doi.org/10.1177/0265532213480128
  • Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465–493. https://doi.org/10.1177/0265532208094273
  • Schissel, J. L., López-Gopar, M., Leung, C., Morales, J., & Davis, J. R. (2019). Classroom-based assessments in linguistically Diverse communities: a case for collaborative research methodologies. Language Assessment Quarterly, 16(4–5), 393–407. https://doi.org/10.1080/15434303.2019.1678041
  • Segbers, J., & Schroeder, S. (2017). How many words do children know? A corpus-based estimation of children’s total vocabulary size. Language Testing, 34(3), 297–320. https://doi.org/10.1177/0265532216641152
  • Shi, B., Huang, L., & Lu, X. (2020). Effect of prompt type on test-takers’ writing performance and writing strategy use in the continuation task. Language Testing, 37(3), 361–388. https://doi.org/10.1177/0265532220911626
  • Silge, J., & Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in R. Journal of Open-Source Software, 1(3), 37. https://doi.org/10.21105/joss.00037
  • Stewart, J., Vitta, J. P., Nicklin, C., McLean, S., Pinchbeck, G. G., & Kramer, B. (2021). The Relationship between Word Difficulty and Frequency: A Response to Hashimoto. Language Assessment Quarterly, 19(1), 90–101. https://doi.org/10.1080/15434303.2021.1992629
  • Tonidandel, S., Summerville, K. M., Gentry, W. A., & Young, S. F. (2021). Using structural topic modeling to gain insight into challenges faced by leaders. The Leadership Quarterly, 33(5), 101576. https://doi.org/10.1016/j.leaqua.2021.101576
  • Usman, N., Hendrik, H., & Madehang, M. (2024). Difficulties in understanding the TOEFL reading test of English language education study program at university. IDEAS: Journal on English Language Teaching and Learning, Linguistics and Literature, 12(1), 755–773. https://doi.org/10.24256/ideas.v12i1.5179
  • Vogt, K., Tsagari, D., & Spanoudis, G. (2020). What do teachers think they want? A comparative study of In-Service Language Teachers’ beliefs on LAL training needs. Language Assessment Quarterly, 17(4), 386–409. https://doi.org/10.1080/15434303.2020.1781128
  • Wang, P. A., & Hsieh, S. (2023). Incorporating structural topic modeling into short text analysis. Concentric Studies in Linguistics, 49(1), 96–138. https://doi.org/10.1075/consl.22026.wan
  • Wolfersberger, M. (2013). Refining the construct of Classroom-Based Writing-From-Readings Assessment: The role of task Representation. Language Assessment Quarterly, 10(1), 49–72. https://doi.org/10.1080/15434303.2012.750661
  • Youn, S. J. (2019). Managing proposal sequences in role-play assessment: Validity evidence of interactional competence across levels. Language Testing, 37(1), 76–106. https://doi.org/10.1177/0265532219860077

Trends and Insights in Educational Measurement through Structural Topic Modeling: A Study in Language Assessment

Yıl 2026, Cilt: 27 Sayı: 1, 290 - 317, 31.01.2026
https://doi.org/10.29299/kefad.1732570

Öz

In this study, Structural Topic Modeling (STM) was employed to identify thematic trends and research orientations within the field of educational measurement. Accordingly, as a representative subfield application, a total of 778 articles published over the past 16 years in the journals Language Testing and Language Assessment Quarterly were analyzed. The STM analysis identified ten distinct themes, with the most prominent topics being “Social, Political, and Ethical Dimensions of Language Testing,” “Advancing Language Assessment Literacy,” and “Psychometric Approaches to Reading and Listening Assessment.” The study also highlights critical issues related to rater reliability, emphasizing its centrality in language assessment research. Furthermore, two interconnected themes emerge concerning the role of vocabulary in language proficiency, particularly in the contexts of sign language and bilingualism. The increasing emphasis on social, political, and ethical dimensions underscores the expanding impact of language testing beyond proficiency measurement, shaping policies and educational practices. Additionally, the prominence of psychometric methodologies and language assessment literacy reflects the field’s ongoing methodological and theoretical advancements. These findings offer valuable insights into emerging priorities and shift in language assessment research for scholars, policymakers, and practitioners.

Kaynakça

  • Aryadoust, V., Eckes, T., & In’nami, Y. (2021). Editorial: Frontiers in Language Assessment and Testing. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.691614
  • Aryadoust, V., Goh, C. C. M., & Kim, L. O. (2011). An investigation of differential item functioning in the MELAB listening Test. Language Assessment Quarterly, 8(4), 361–385. https://doi.org/10.1080/15434303.2011.628632 Aryadoust, V., Zakaria, A., Lim, M. H., & Chen, C. (2020). An extensive knowledge mapping review of measurement and validity in language assessment and SLA research. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.01941
  • Bachman, L. F., & Clark, J. L. D. (1987). The measurement of Foreign/Second Language Proficiency. The Annals of the American Academy of Political and Social Science, 490(1), 20–33. https://doi.org/10.1177/0002716287490001003
  • Bae, J., Bentler, P. M., & Lee, Y. (2016). On the role of content in writing assessment. Language Assessment Quarterly, 13(4), 302–328. https://doi.org/10.1080/15434303.2016.1246552
  • Baker, B. A., & Riches, C. (2017). The development of EFL examinations in Haiti: Collaboration and language assessment literacy development. Language Testing, 35(4), 557–581. https://doi.org/10.1177/0265532217716732
  • Banks, G. C., Woznyj, H. M., Wesslen, R. S., & Ross, R. L. (2018). A review of best practice recommendations for text analysis in R (and a User-Friendly app). Journal of Business and Psychology, 33(4), 445–459. https://doi.org/10.1007/s10869-017-9528-3
  • Barkaoui, K. (2010a). Explaining ESL essay holistic scores: A multilevel modeling approach. Language Testing, 27(4), 515-535. https://doi.org/10.1177/0265532210368717
  • Barkaoui, K. (2010b). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28(1), 51–75. https://doi.org/10.1177/0265532210376379
  • Barkaoui, K. (2010c). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74. https://doi.org/10.1080/15434300903464418
  • Barkaoui, K. (2024). The Academic Achievement of Undergraduate Students with Different English Language Proficiency Profiles. Language Assessment Quarterly, 21(3), 224–244. https://doi.org/10.1080/15434303.2024.2346089
  • Barkaoui, K. (2025). The relationship between English language proficiency test scores and academic achievement: A longitudinal study of two tests. Language Testing, 0(0). https://doi.org/10.1177/02655322251319284
  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. https://doi.org/10.5555/944919.944937
  • Bochner, J. H., Samar, V. J., Hauser, P. C., Garrison, W. M., Searls, J. M., & Sanders, C. A. (2015). Validity of the American Sign Language Discrimination Test. Language Testing, 33(4), 473–495. https://doi.org/10.1177/0265532215590849
  • Carlsen, C. H., & Rocca, L. (2021). Language test misuse. Language Assessment Quarterly, 18(5), 477–491. https://doi.org/10.1080/15434303.2021.1947288
  • Cho, Y., & Bridgeman, B. (2012). Relationship of TOEFL iBT® scores to academic performance: Some evidence from American universities. Language Testing, 29(3), 421–442. https://doi.org/10.1177/0265532211430368
  • Choi, H., & Woo, J. (2022). Investigating emerging hydrogen technology topics and comparing national level technological focus: Patent analysis using a structural topic model. Applied Energy, 313, 118898.https://doi.org/10.1016/j.apenergy.2022.118898
  • Coghlan, S., Miller, T., & Paterson, J. (2021). Good proctor or “big brother”? Ethics of online exam supervision technologies. Philosophy & Technology, 34(4), 1581–1606. https://doi.org/10.1007/s13347-021-00476-1
  • Eckes, T. (2012). Operational Rater types in writing assessment: linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270–292. https://doi.org/10.1080/15434303.2011.64938
  • Elder, C., & McNamara, T. (2015). The hunt for “indigenous criteria” in assessing communication in the physiotherapy workplace. Language Testing, 33(2), 153–174. https://doi.org/10.1177/0265532215607398
  • Fan, J., & Yan, X. (2020). Assessing Speaking Proficiency: A narrative review of speaking assessment research within the Argument-Based Validation Framework. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.00330
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education. McGrawhill.
  • Gamaroff, R. (2000). Rater reliability in language assessment: The bug of all bears. System, 28(1), 31–53. https://doi.org/10.1016/S0346-251X(99)00059-7
  • Gardner, R. C., & MacIntyre, P. D. (1992). A student’s contributions to second language learning. Part I: Cognitive variables. Language Teaching, 25(4), 211–220. https://doi.org/10.1017/S026144480000700X
  • Gokturk, N., & Chukharev, E. (2024). Exploring the potential of a spoken Dialog System-Delivered Paired Discussion task for assessing interactional competence. Language Assessment Quarterly, 21(1), 60–99. https://doi.org/10.1080/15434303.2023.2289173
  • Hamdani, S., Chan, A., Kan, R., Chiat, S., Gagarina, N., Haman, E., … Armon-Lotem, S. (2024). Identifying developmental language disorder (DLD) in multilingual children: A case study tutorial. International Journal of Speech-Language Pathology, 1–15. https://doi.org/10.1080/17549507.2024.2326095
  • Hauck, M. C., Wolf, M. K., & Mislevy, R. (2016). Creating a Next-Generation system of K-12 English learner language proficiency assessments. ETS Research Report Series, 2016(1), 1–10. https://doi.org/10.1002/ets2.12092
  • Huang, F. L., & Konold, T. R. (2013). A latent variable investigation of the Phonological Awareness Literacy Screening-Kindergarten assessment: Construct identification and multigroup comparisons between Spanish-speaking English-language learners (ELLs) and non-ELL students. Language Testing, 31(2), 205–221. https://doi.org/10.1177/0265532213496773
  • Isaacs, T., Hu, R., Trenkic, D., & Varga, J. (2023). Examining the predictive validity of the Duolingo English Test: Evidence from a major UK university. Language Testing, 40(3), 748–770. https://doi.org/10.1177/02655322231158550
  • Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135–159. https://doi.org/10.1080/15434303.2013.769545
  • Isbell, D. R., Kremmel, B., & Kim, J. (2023). Remote proctoring in Language Testing: Implications for fairness and justice. Language Assessment Quarterly, 20(4–5), 469–487. https://doi.org/10.1080/15434303.2023.2288251
  • Jang, E. E., Cummins, J., Wagner, M., Stille, S., & Dunlop, M. (2015). Investigating the homogeneity and distinguishability of STEP proficiency descriptors in assessing English language learners in Ontario schools. Language Assessment Quarterly, 12(1), 87–109. https://doi.org/10.1080/15434303.2014.936602
  • Javidanmehr, Z., & Sarab, M. R. A. (2019). Retrofitting non-diagnostic reading comprehension assessment: Application of the G-DINA model to a high-stake reading comprehension test. Language Assessment Quarterly, 16(3), 294–311. https://doi.org/10.1080/15434303.2019.1654479
  • Kessler, G. (2018). Technology and the future of language teaching. Foreign Language Annals, 51(1), 205–218.
  • Kokhan, K. (2012). Investigating the possibility of using TOEFL scores for university ESL decision-making: Placement trends and effect of time lag. Language Testing, 29(2), 291–308. https://doi.org/10.1177/0265532211429403
  • Kotowicz, J., Woll, B., & Herman, R. (2020). Adaptation of the British Sign Language Receptive Skills Test into Polish Sign Language. Language Testing, 38(1), 132–153. https://doi.org/10.1177/0265532220924598
  • Kozaki, Y. (2010). An alternative decision-making procedure for performance assessments: Using the multifaceted Rasch model to generate cut estimates. Language Assessment Quarterly, 7(1), 75–95. https://doi.org/10.1080/15434300903464400
  • Kremmel, B., & Schmitt, N. (2016). Interpreting vocabulary test scores: What do various item formats tell us about learners’ ability to employ words? Language Assessment Quarterly, 13(4), 377–392. https://doi.org/10.1080/15434303.2016.1237516
  • Kuhn, K. D. (2018). Using structural topic modeling to identify latent topics and trends in aviation incident reports. Transportation Research Part C Emerging Technologies, 87, 105–122. https://doi.org/10.1016/j.trc.2017.12.018
  • Kunnan, A. J. (2009). Testing for citizenship: The U.S. naturalization test. Language Assessment Quarterly, 6(1), 89–97. https://doi.org/10.1080/15434300802606630
  • Kyle, K., & Crossley, S. (2017). Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing, 34(4), 513–535. https://doi.org/10.1177/0265532217712554
  • Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly, 18(2), 154–170. https://doi.org/10.1080/15434303.2020.1844205
  • Lam, R. (2014). Language assessment training in Hong Kong: Implications for language assessment literacy. Language Testing, 32(2), 169–197. https://doi.org/10.1177/0265532214554321
  • Lam, D. M. K. (2019). Interactional Competence with and without Extended Planning Time in a Group Oral Assessment. Language Assessment Quarterly, 16(1), 1–20. https://doi.org/10.1080/15434303.2019.1602627
  • Laufer, B., & McLean, S. (2016). Loanwords and vocabulary size test scores: A case of different estimates for different L1 learners. Language Assessment Quarterly, 13(3), 202–217. https://doi.org/10.1080/15434303.2016.1210611
  • Li, X., Dai, A., Tran, R., & Wang, J. (2023). Text mining-based identification of promising miRNA biomarkers for diabetes mellitus. Frontiers in Endocrinology, 14. https://doi.org/10.3389/fendo.2023.1195145
  • Liu, H. Y., You, X. F., Wang, W. Y., Ding, S. L., & Chang, H. H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30(2), 152-172. https://doi.org/10.1007/s00357-013-9128-5
  • Liu, T., Aryadoust, V., & Foo, S. (2021). Examining the factor structure and its replicability across multiple listening test forms: Validity evidence for the Michigan English Test. Language Testing, 39(1), 142–171. https://doi.org/10.1177/02655322211018139
  • Manias, E., & McNamara, T. (2016). Standard setting in specific-purpose language testing: What can a qualitative study add? Language Testing, 33(2), 235–249. https://doi.org/10.1177/0265532215608411
  • May, L. (2011). Interactional competence in a paired speaking test: Features salient to raters. Language Assessment Quarterly, 8(2), 127–145. https://doi.org/10.1080/15434303.2011.565845
  • McNamara, T. (2009). Australia: The dictation tests redux? Language Assessment Quarterly, 6(1), 106–111. https://doi.org/10.1080/15434300802606663
  • McNamara, T., & Ryan, K. (2011). Fairness versus justice in language testing: The place of English literacy in the Australian citizenship Test. Language Assessment Quarterly, 8(2), 161–178. https://doi.org/10.1080/15434303.2011.565438
  • Min, S., & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31(4), 453–477. https://doi.org/10.1177/0265532214527277
  • Min, S., Cai, H., & He, L. (2021). Application of bi-factor MIRT and higher-order CDM models to an in-house EFL listening test for diagnostic purposes. Language Assessment Quarterly, 19(2), 189–213. https://doi.org/10.1080/15434303.2021.1980571
  • Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
  • O’Hagan, S., Pill, J., & Zhang, Y. (2015). Extending the scope of speaking assessment criteria in a specific-purpose language test: Operationalizing a health professional perspective. Language Testing, 33(2), 195–216. https://doi.org/10.1177/0265532215607920
  • Olson, D. J. (2023). Measuring bilingual language dominance: An examination of the reliability of the Bilingual Language Profile. Language Testing, 40(3), 521–547. https://doi.org/10.1177/02655322221139162
  • Peña, E. D., Bedore, L. M., Lugo-Neris, M. J., & Albudoor, N. (2020). Identifying developmental language disorder in school-age bilinguals: Semantics, grammar, and narratives. Language Assessment Quarterly, 17(5), 541–558. https://doi.org/10.1080/15434303.2020.1827258
  • Plough, I. C., & Bogart, P. S. H. (2008). Perceptions of examiner behavior modulate power relations in oral performance testing. Language Assessment Quarterly, 5(3), 195–217. https://doi.org/10.1080/15434300802229375
  • Pill, J. (2015). Drawing on indigenous criteria for more authentic assessment in a specific-purpose language test: Health professionals interacting with patients. Language Testing, 33(2), 175–193. https://doi.org/10.1177/0265532215607400
  • Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder‐Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for Open‐Ended Survey Responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103
  • Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). stm: An R package for structural topic models. Journal of Statistical Software, 91(2). https://doi.org/10.18637/jss.v091.i02
  • Robles-García, P., McLean, S., Stewart, J., Shin, J. young, & Sánchez-Gutiérrez, C. H. (2024). The development and initial validation of O-WSVLT, a meaning-recall online L2 Spanish vocabulary levels test. Language Assessment Quarterly, 21(2), 181–205. https://doi.org/10.1080/15434303.2024.2311724
  • Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role of interpretation in assessment and in teacher learning. Language Testing, 30(3), 309–327. https://doi.org/10.1177/0265532213480128
  • Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465–493. https://doi.org/10.1177/0265532208094273
  • Schissel, J. L., López-Gopar, M., Leung, C., Morales, J., & Davis, J. R. (2019). Classroom-based assessments in linguistically Diverse communities: a case for collaborative research methodologies. Language Assessment Quarterly, 16(4–5), 393–407. https://doi.org/10.1080/15434303.2019.1678041
  • Segbers, J., & Schroeder, S. (2017). How many words do children know? A corpus-based estimation of children’s total vocabulary size. Language Testing, 34(3), 297–320. https://doi.org/10.1177/0265532216641152
  • Shi, B., Huang, L., & Lu, X. (2020). Effect of prompt type on test-takers’ writing performance and writing strategy use in the continuation task. Language Testing, 37(3), 361–388. https://doi.org/10.1177/0265532220911626
  • Silge, J., & Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in R. Journal of Open-Source Software, 1(3), 37. https://doi.org/10.21105/joss.00037
  • Stewart, J., Vitta, J. P., Nicklin, C., McLean, S., Pinchbeck, G. G., & Kramer, B. (2021). The Relationship between Word Difficulty and Frequency: A Response to Hashimoto. Language Assessment Quarterly, 19(1), 90–101. https://doi.org/10.1080/15434303.2021.1992629
  • Tonidandel, S., Summerville, K. M., Gentry, W. A., & Young, S. F. (2021). Using structural topic modeling to gain insight into challenges faced by leaders. The Leadership Quarterly, 33(5), 101576. https://doi.org/10.1016/j.leaqua.2021.101576
  • Usman, N., Hendrik, H., & Madehang, M. (2024). Difficulties in understanding the TOEFL reading test of English language education study program at university. IDEAS: Journal on English Language Teaching and Learning, Linguistics and Literature, 12(1), 755–773. https://doi.org/10.24256/ideas.v12i1.5179
  • Vogt, K., Tsagari, D., & Spanoudis, G. (2020). What do teachers think they want? A comparative study of In-Service Language Teachers’ beliefs on LAL training needs. Language Assessment Quarterly, 17(4), 386–409. https://doi.org/10.1080/15434303.2020.1781128
  • Wang, P. A., & Hsieh, S. (2023). Incorporating structural topic modeling into short text analysis. Concentric Studies in Linguistics, 49(1), 96–138. https://doi.org/10.1075/consl.22026.wan
  • Wolfersberger, M. (2013). Refining the construct of Classroom-Based Writing-From-Readings Assessment: The role of task Representation. Language Assessment Quarterly, 10(1), 49–72. https://doi.org/10.1080/15434303.2012.750661
  • Youn, S. J. (2019). Managing proposal sequences in role-play assessment: Validity evidence of interactional competence across levels. Language Testing, 37(1), 76–106. https://doi.org/10.1177/0265532219860077
Toplam 75 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları
Bölüm Araştırma Makalesi
Yazarlar

Kübra Atalay Kabasakal 0000-0002-3580-5568

Duygu Koçak 0000-0003-3211-0426

Rabia Akcan 0000-0003-3025-774X

Gönderilme Tarihi 2 Temmuz 2025
Kabul Tarihi 16 Eylül 2025
Yayımlanma Tarihi 31 Ocak 2026
Yayımlandığı Sayı Yıl 2026 Cilt: 27 Sayı: 1

Kaynak Göster

APA Atalay Kabasakal, K., Koçak, D., & Akcan, R. (2026). Yapısal Konu Modellemesi Yoluyla Eğitimde Ölçme Alanındaki Eğilimler ve İçgörüler: Dil Değerlendirmesi Üzerine Bir İnceleme. Ahi Evran Üniversitesi Kırşehir Eğitim Fakültesi Dergisi, 27(1), 290-317. https://doi.org/10.29299/kefad.1732570

2562219122   19121           19118       19119       19120     19124DRJI_Logo.jpg