TY - JOUR T1 - Yapısal Konu Modellemesi Yoluyla Eğitimde Ölçme Alanındaki Eğilimler ve İçgörüler: Dil Değerlendirmesi Üzerine Bir İnceleme TT - Trends and Insights in Educational Measurement through Structural Topic Modeling: A Study in Language Assessment AU - Atalay Kabasakal, Kübra AU - Koçak, Duygu AU - Akcan, Rabia PY - 2026 DA - January Y2 - 2025 DO - 10.29299/kefad.1732570 JF - Ahi Evran Üniversitesi Kırşehir Eğitim Fakültesi Dergisi JO - KEFAD PB - Kırşehir Ahi Evran Üniversitesi WT - DergiPark SN - 2147-1037 SP - 291 EP - 318 VL - 27 IS - 1 LA - tr AB - Bu araştırmada, eğitimde ölçme alanındaki tematik eğilimleri ve araştırma yönelimlerini ortaya koymak amacıyla Yapısal Konu Modellemesi (STM) kullanılmıştır. Bu doğrultuda, örnek bir alt alan uygulaması olarak Language Testing ve Language Assessment Quarterly dergilerinde son 16 yılda yayımlanan toplam 778 makale analiz edilmiştir. STM analizi, en belirgin konuların “Dil Testinin Sosyal, Politik ve Etik Boyutları”, “Dil Değerlendirme Okuryazarlığının Geliştirilmesi” ve “Okuma ve Dinleme Değerlendirmelerinde Psikometrik Yaklaşımlar” olduğu on farklı tema ortaya koymuştur. Çalışmada ayrıca, değerlendirici güvenirliğine ilişkin kritik sorunlar vurgulanmakta ve bu konunun dil değerlendirme araştırmalarındaki merkezi rolüne dikkat çekilmektedir. Ayrıca, işaret dili ve iki dillilik bağlamlarında özellikle sözcük bilgisinin dil yeterliğindeki rolüne ilişkin iki birbiriyle bağlantılı tema öne çıkmaktadır. Dil testinin sosyal, politik ve etik boyutlarına artan vurgu, bu alanın yalnızca yeterlilik ölçümünü aşarak eğitim politikalarını ve uygulamalarını şekillendirme gücünü göstermektedir. Psikometrik yöntemlerin ve dil değerlendirme okuryazarlığının öne çıkması ise alandaki süregelen kuramsal ve yöntemsel gelişmelere işaret etmektedir. Bu bulgular, dil değerlendirme araştırmalarındaki önceliklerin ve yönelimlerin nasıl değiştiğine ilişkin araştırmacılar, politika yapıcılar ve uygulayıcılar için önemli içgörüler sunmaktadır. KW - Metin madenciliği KW - Yapısal konu modellemesi KW - Dil testi ve değerlendirmesi N2 - In this study, Structural Topic Modeling (STM) was employed to identify thematic trends and research orientations within the field of educational measurement. Accordingly, as a representative subfield application, a total of 778 articles published over the past 16 years in the journals Language Testing and Language Assessment Quarterly were analyzed. The STM analysis identified ten distinct themes, with the most prominent topics being “Social, Political, and Ethical Dimensions of Language Testing,” “Advancing Language Assessment Literacy,” and “Psychometric Approaches to Reading and Listening Assessment.” The study also highlights critical issues related to rater reliability, emphasizing its centrality in language assessment research. Furthermore, two interconnected themes emerge concerning the role of vocabulary in language proficiency, particularly in the contexts of sign language and bilingualism. The increasing emphasis on social, political, and ethical dimensions underscores the expanding impact of language testing beyond proficiency measurement, shaping policies and educational practices. Additionally, the prominence of psychometric methodologies and language assessment literacy reflects the field’s ongoing methodological and theoretical advancements. These findings offer valuable insights into emerging priorities and shift in language assessment research for scholars, policymakers, and practitioners. CR - Aryadoust, V., Eckes, T., & In’nami, Y. (2021). Editorial: Frontiers in Language Assessment and Testing. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.691614 CR - Aryadoust, V., Goh, C. C. M., & Kim, L. O. (2011). An investigation of differential item functioning in the MELAB listening Test. Language Assessment Quarterly, 8(4), 361–385. https://doi.org/10.1080/15434303.2011.628632 Aryadoust, V., Zakaria, A., Lim, M. H., & Chen, C. (2020). An extensive knowledge mapping review of measurement and validity in language assessment and SLA research. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.01941 CR - Bachman, L. F., & Clark, J. L. D. (1987). The measurement of Foreign/Second Language Proficiency. The Annals of the American Academy of Political and Social Science, 490(1), 20–33. https://doi.org/10.1177/0002716287490001003 CR - Bae, J., Bentler, P. M., & Lee, Y. (2016). On the role of content in writing assessment. Language Assessment Quarterly, 13(4), 302–328. https://doi.org/10.1080/15434303.2016.1246552 CR - Baker, B. A., & Riches, C. (2017). The development of EFL examinations in Haiti: Collaboration and language assessment literacy development. Language Testing, 35(4), 557–581. https://doi.org/10.1177/0265532217716732 CR - Banks, G. C., Woznyj, H. M., Wesslen, R. S., & Ross, R. L. (2018). A review of best practice recommendations for text analysis in R (and a User-Friendly app). Journal of Business and Psychology, 33(4), 445–459. https://doi.org/10.1007/s10869-017-9528-3 CR - Barkaoui, K. (2010a). Explaining ESL essay holistic scores: A multilevel modeling approach. Language Testing, 27(4), 515-535. https://doi.org/10.1177/0265532210368717 CR - Barkaoui, K. (2010b). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28(1), 51–75. https://doi.org/10.1177/0265532210376379 CR - Barkaoui, K. (2010c). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74. https://doi.org/10.1080/15434300903464418 CR - Barkaoui, K. (2024). The Academic Achievement of Undergraduate Students with Different English Language Proficiency Profiles. Language Assessment Quarterly, 21(3), 224–244. https://doi.org/10.1080/15434303.2024.2346089 CR - Barkaoui, K. (2025). The relationship between English language proficiency test scores and academic achievement: A longitudinal study of two tests. Language Testing, 0(0). https://doi.org/10.1177/02655322251319284 CR - Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. https://doi.org/10.5555/944919.944937 CR - Bochner, J. H., Samar, V. J., Hauser, P. C., Garrison, W. M., Searls, J. M., & Sanders, C. A. (2015). Validity of the American Sign Language Discrimination Test. Language Testing, 33(4), 473–495. https://doi.org/10.1177/0265532215590849 CR - Carlsen, C. H., & Rocca, L. (2021). Language test misuse. Language Assessment Quarterly, 18(5), 477–491. https://doi.org/10.1080/15434303.2021.1947288 CR - Cho, Y., & Bridgeman, B. (2012). Relationship of TOEFL iBT® scores to academic performance: Some evidence from American universities. Language Testing, 29(3), 421–442. https://doi.org/10.1177/0265532211430368 CR - Choi, H., & Woo, J. (2022). Investigating emerging hydrogen technology topics and comparing national level technological focus: Patent analysis using a structural topic model. Applied Energy, 313, 118898.https://doi.org/10.1016/j.apenergy.2022.118898 CR - Coghlan, S., Miller, T., & Paterson, J. (2021). Good proctor or “big brother”? Ethics of online exam supervision technologies. Philosophy & Technology, 34(4), 1581–1606. https://doi.org/10.1007/s13347-021-00476-1 CR - Eckes, T. (2012). Operational Rater types in writing assessment: linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270–292. https://doi.org/10.1080/15434303.2011.64938 CR - Elder, C., & McNamara, T. (2015). The hunt for “indigenous criteria” in assessing communication in the physiotherapy workplace. Language Testing, 33(2), 153–174. https://doi.org/10.1177/0265532215607398 CR - Fan, J., & Yan, X. (2020). Assessing Speaking Proficiency: A narrative review of speaking assessment research within the Argument-Based Validation Framework. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.00330 CR - Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education. McGrawhill. CR - Gamaroff, R. (2000). Rater reliability in language assessment: The bug of all bears. System, 28(1), 31–53. https://doi.org/10.1016/S0346-251X(99)00059-7 CR - Gardner, R. C., & MacIntyre, P. D. (1992). A student’s contributions to second language learning. Part I: Cognitive variables. Language Teaching, 25(4), 211–220. https://doi.org/10.1017/S026144480000700X CR - Gokturk, N., & Chukharev, E. (2024). Exploring the potential of a spoken Dialog System-Delivered Paired Discussion task for assessing interactional competence. Language Assessment Quarterly, 21(1), 60–99. https://doi.org/10.1080/15434303.2023.2289173 CR - Hamdani, S., Chan, A., Kan, R., Chiat, S., Gagarina, N., Haman, E., … Armon-Lotem, S. (2024). Identifying developmental language disorder (DLD) in multilingual children: A case study tutorial. International Journal of Speech-Language Pathology, 1–15. https://doi.org/10.1080/17549507.2024.2326095 CR - Hauck, M. C., Wolf, M. K., & Mislevy, R. (2016). Creating a Next-Generation system of K-12 English learner language proficiency assessments. ETS Research Report Series, 2016(1), 1–10. https://doi.org/10.1002/ets2.12092 CR - Huang, F. L., & Konold, T. R. (2013). A latent variable investigation of the Phonological Awareness Literacy Screening-Kindergarten assessment: Construct identification and multigroup comparisons between Spanish-speaking English-language learners (ELLs) and non-ELL students. Language Testing, 31(2), 205–221. https://doi.org/10.1177/0265532213496773 CR - Isaacs, T., Hu, R., Trenkic, D., & Varga, J. (2023). Examining the predictive validity of the Duolingo English Test: Evidence from a major UK university. Language Testing, 40(3), 748–770. https://doi.org/10.1177/02655322231158550 CR - Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135–159. https://doi.org/10.1080/15434303.2013.769545 CR - Isbell, D. R., Kremmel, B., & Kim, J. (2023). Remote proctoring in Language Testing: Implications for fairness and justice. Language Assessment Quarterly, 20(4–5), 469–487. https://doi.org/10.1080/15434303.2023.2288251 CR - Jang, E. E., Cummins, J., Wagner, M., Stille, S., & Dunlop, M. (2015). Investigating the homogeneity and distinguishability of STEP proficiency descriptors in assessing English language learners in Ontario schools. Language Assessment Quarterly, 12(1), 87–109. https://doi.org/10.1080/15434303.2014.936602 CR - Javidanmehr, Z., & Sarab, M. R. A. (2019). Retrofitting non-diagnostic reading comprehension assessment: Application of the G-DINA model to a high-stake reading comprehension test. Language Assessment Quarterly, 16(3), 294–311. https://doi.org/10.1080/15434303.2019.1654479 CR - Kessler, G. (2018). Technology and the future of language teaching. Foreign Language Annals, 51(1), 205–218. CR - Kokhan, K. (2012). Investigating the possibility of using TOEFL scores for university ESL decision-making: Placement trends and effect of time lag. Language Testing, 29(2), 291–308. https://doi.org/10.1177/0265532211429403 CR - Kotowicz, J., Woll, B., & Herman, R. (2020). Adaptation of the British Sign Language Receptive Skills Test into Polish Sign Language. Language Testing, 38(1), 132–153. https://doi.org/10.1177/0265532220924598 CR - Kozaki, Y. (2010). An alternative decision-making procedure for performance assessments: Using the multifaceted Rasch model to generate cut estimates. Language Assessment Quarterly, 7(1), 75–95. https://doi.org/10.1080/15434300903464400 CR - Kremmel, B., & Schmitt, N. (2016). Interpreting vocabulary test scores: What do various item formats tell us about learners’ ability to employ words? Language Assessment Quarterly, 13(4), 377–392. https://doi.org/10.1080/15434303.2016.1237516 CR - Kuhn, K. D. (2018). Using structural topic modeling to identify latent topics and trends in aviation incident reports. Transportation Research Part C Emerging Technologies, 87, 105–122. https://doi.org/10.1016/j.trc.2017.12.018 CR - Kunnan, A. J. (2009). Testing for citizenship: The U.S. naturalization test. Language Assessment Quarterly, 6(1), 89–97. https://doi.org/10.1080/15434300802606630 CR - Kyle, K., & Crossley, S. (2017). Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing, 34(4), 513–535. https://doi.org/10.1177/0265532217712554 CR - Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly, 18(2), 154–170. https://doi.org/10.1080/15434303.2020.1844205 CR - Lam, R. (2014). Language assessment training in Hong Kong: Implications for language assessment literacy. Language Testing, 32(2), 169–197. https://doi.org/10.1177/0265532214554321 CR - Lam, D. M. K. (2019). Interactional Competence with and without Extended Planning Time in a Group Oral Assessment. Language Assessment Quarterly, 16(1), 1–20. https://doi.org/10.1080/15434303.2019.1602627 CR - Laufer, B., & McLean, S. (2016). Loanwords and vocabulary size test scores: A case of different estimates for different L1 learners. Language Assessment Quarterly, 13(3), 202–217. https://doi.org/10.1080/15434303.2016.1210611 CR - Li, X., Dai, A., Tran, R., & Wang, J. (2023). Text mining-based identification of promising miRNA biomarkers for diabetes mellitus. Frontiers in Endocrinology, 14. https://doi.org/10.3389/fendo.2023.1195145 CR - Liu, H. Y., You, X. F., Wang, W. Y., Ding, S. L., & Chang, H. H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30(2), 152-172. https://doi.org/10.1007/s00357-013-9128-5 CR - Liu, T., Aryadoust, V., & Foo, S. (2021). Examining the factor structure and its replicability across multiple listening test forms: Validity evidence for the Michigan English Test. Language Testing, 39(1), 142–171. https://doi.org/10.1177/02655322211018139 CR - Manias, E., & McNamara, T. (2016). Standard setting in specific-purpose language testing: What can a qualitative study add? Language Testing, 33(2), 235–249. https://doi.org/10.1177/0265532215608411 CR - May, L. (2011). Interactional competence in a paired speaking test: Features salient to raters. Language Assessment Quarterly, 8(2), 127–145. https://doi.org/10.1080/15434303.2011.565845 CR - McNamara, T. (2009). Australia: The dictation tests redux? Language Assessment Quarterly, 6(1), 106–111. https://doi.org/10.1080/15434300802606663 CR - McNamara, T., & Ryan, K. (2011). Fairness versus justice in language testing: The place of English literacy in the Australian citizenship Test. Language Assessment Quarterly, 8(2), 161–178. https://doi.org/10.1080/15434303.2011.565438 CR - Min, S., & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31(4), 453–477. https://doi.org/10.1177/0265532214527277 CR - Min, S., Cai, H., & He, L. (2021). Application of bi-factor MIRT and higher-order CDM models to an in-house EFL listening test for diagnostic purposes. Language Assessment Quarterly, 19(2), 189–213. https://doi.org/10.1080/15434303.2021.1980571 CR - Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422. CR - O’Hagan, S., Pill, J., & Zhang, Y. (2015). Extending the scope of speaking assessment criteria in a specific-purpose language test: Operationalizing a health professional perspective. Language Testing, 33(2), 195–216. https://doi.org/10.1177/0265532215607920 CR - Olson, D. J. (2023). Measuring bilingual language dominance: An examination of the reliability of the Bilingual Language Profile. Language Testing, 40(3), 521–547. https://doi.org/10.1177/02655322221139162 CR - Peña, E. D., Bedore, L. M., Lugo-Neris, M. J., & Albudoor, N. (2020). Identifying developmental language disorder in school-age bilinguals: Semantics, grammar, and narratives. Language Assessment Quarterly, 17(5), 541–558. https://doi.org/10.1080/15434303.2020.1827258 CR - Plough, I. C., & Bogart, P. S. H. (2008). Perceptions of examiner behavior modulate power relations in oral performance testing. Language Assessment Quarterly, 5(3), 195–217. https://doi.org/10.1080/15434300802229375 CR - Pill, J. (2015). Drawing on indigenous criteria for more authentic assessment in a specific-purpose language test: Health professionals interacting with patients. Language Testing, 33(2), 175–193. https://doi.org/10.1177/0265532215607400 CR - Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder‐Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for Open‐Ended Survey Responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103 CR - Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). stm: An R package for structural topic models. Journal of Statistical Software, 91(2). https://doi.org/10.18637/jss.v091.i02 CR - Robles-García, P., McLean, S., Stewart, J., Shin, J. young, & Sánchez-Gutiérrez, C. H. (2024). The development and initial validation of O-WSVLT, a meaning-recall online L2 Spanish vocabulary levels test. Language Assessment Quarterly, 21(2), 181–205. https://doi.org/10.1080/15434303.2024.2311724 CR - Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role of interpretation in assessment and in teacher learning. Language Testing, 30(3), 309–327. https://doi.org/10.1177/0265532213480128 CR - Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465–493. https://doi.org/10.1177/0265532208094273 CR - Schissel, J. L., López-Gopar, M., Leung, C., Morales, J., & Davis, J. R. (2019). Classroom-based assessments in linguistically Diverse communities: a case for collaborative research methodologies. Language Assessment Quarterly, 16(4–5), 393–407. https://doi.org/10.1080/15434303.2019.1678041 CR - Segbers, J., & Schroeder, S. (2017). How many words do children know? A corpus-based estimation of children’s total vocabulary size. Language Testing, 34(3), 297–320. https://doi.org/10.1177/0265532216641152 CR - Shi, B., Huang, L., & Lu, X. (2020). Effect of prompt type on test-takers’ writing performance and writing strategy use in the continuation task. Language Testing, 37(3), 361–388. https://doi.org/10.1177/0265532220911626 CR - Silge, J., & Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in R. Journal of Open-Source Software, 1(3), 37. https://doi.org/10.21105/joss.00037 CR - Stewart, J., Vitta, J. P., Nicklin, C., McLean, S., Pinchbeck, G. G., & Kramer, B. (2021). The Relationship between Word Difficulty and Frequency: A Response to Hashimoto. Language Assessment Quarterly, 19(1), 90–101. https://doi.org/10.1080/15434303.2021.1992629 CR - Tonidandel, S., Summerville, K. M., Gentry, W. A., & Young, S. F. (2021). Using structural topic modeling to gain insight into challenges faced by leaders. The Leadership Quarterly, 33(5), 101576. https://doi.org/10.1016/j.leaqua.2021.101576 CR - Usman, N., Hendrik, H., & Madehang, M. (2024). Difficulties in understanding the TOEFL reading test of English language education study program at university. IDEAS: Journal on English Language Teaching and Learning, Linguistics and Literature, 12(1), 755–773. https://doi.org/10.24256/ideas.v12i1.5179 CR - Vogt, K., Tsagari, D., & Spanoudis, G. (2020). What do teachers think they want? A comparative study of In-Service Language Teachers’ beliefs on LAL training needs. Language Assessment Quarterly, 17(4), 386–409. https://doi.org/10.1080/15434303.2020.1781128 CR - Wang, P. A., & Hsieh, S. (2023). Incorporating structural topic modeling into short text analysis. Concentric Studies in Linguistics, 49(1), 96–138. https://doi.org/10.1075/consl.22026.wan CR - Wolfersberger, M. (2013). Refining the construct of Classroom-Based Writing-From-Readings Assessment: The role of task Representation. Language Assessment Quarterly, 10(1), 49–72. https://doi.org/10.1080/15434303.2012.750661 CR - Youn, S. J. (2019). Managing proposal sequences in role-play assessment: Validity evidence of interactional competence across levels. Language Testing, 37(1), 76–106. https://doi.org/10.1177/0265532219860077 UR - https://doi.org/10.29299/kefad.1732570 L1 - https://dergipark.org.tr/tr/download/article-file/5013762 ER -