TY - JOUR T1 - Discovering Hidden Patterns: Applying Topic Modeling in Qualitative Research AU - Tat, Osman AU - Aydogan, Izzettin PY - 2024 DA - October Y2 - 2024 DO - 10.21031/epod.1539694 JF - Journal of Measurement and Evaluation in Education and Psychology JO - JMEEP PB - Association for Measurement and Evaluation in Education and Psychology WT - DergiPark SN - 1309-6575 SP - 247 EP - 259 VL - 15 IS - 3 LA - en AB - In qualitative studies, researchers must devote a significant amount of time and effort to extracting meaningful themes from huge sets of texts and examining the links between themes, which are frequently done manually. The availability of natural language models has enabled the application of a wide range of techniques for automatically detecting hierarchy, linkages, and latent themes in texts. This paper aims to investigate the coherence of the topics acquired from the analysis with the predefined themes, the hierarchy between the topics, the similarity between the topics and the proximity-distance between the topics by means of the topic model based on BERTopic using unstructured qualitative data. The qualitative data for this study was gathered from 106 students engaged in a university-run pedagogical formation certificate program. In BERTopic procedure, paraphrase-multilingual-MiniLM-L12-v2 model was used as sentence transformer model, UMAP was used as dimension reduction method and HDBSCAN algorithm was used as clustering method. It is found that BERTopic successfully identified six topics corresponding to the six predicted themes in unstructured texts. Moreover 74% of the texts containing some themes could be classified accurately. The algorithm was also able to successfully identify which topics were similar and which topics differed significantly from the others. It was concluded that BERTopic is a procedure that can identify themes that researchers do not notice depending on the density of the data in qualitative data analysis and has the potential to enable qualitative research to reach more detailed findings. KW - BERTopic KW - natural language processing KW - topic modeling CR - Abuzayed, A., & Al‐Khalifa, H. S. (2021). Bert for Arabic topic modeling: An experimental study on BERTopic technique. Procedia Computer Science, 189, 191-194. https://doi.org/10.1016/j.procs.2021.05.096 CR - Aggarwal, E., & Nair, S. (2012). NLP token matching on database using binary search. International Journal of Computers & Technology, 3(1), 140-143. https://doi.org/10.24297/ijct.v3i1c.2766 CR - Bent, M., Velazquez-Godinez, E., & Jong, F. (2021). Becoming an expert teacher: Assessing expertise growth in peer feedback video recordings by lexical analysis. Education Sciences, 11(11), 665. https://doi.org/10.3390/educsci11110665 CR - Bianchi, F., Terragni, S., Hovy, D., Nozza, D., & Fersini, E. (2021). Cross-lingual Contextualized Topic Models with Zero-shot Learning. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.), Proceedings of the 16th conference of the European chapter of the association for computational linguistics: Main volume,1676–1683. doi:10.18653/v1/2021.eacl-main.143 CR - Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3(1), 993–1022. CR - Boussaadi, S., Aliane, H., & Abdeldjalil, O. (2023). Using an explicit query and a topic model for scientific article recommendation. Education and Information Technologies, 28(12), 15657-15670. https://doi.org/10.1007/s10639-023-11817-2 CR - Casillano, N. F. B. (2022). Discovering sentiments and latent themes in the views of faculty members towards the shift from conventional to online teaching using VADER and latent dirichlet allocation. International Journal of Information and Education Technology, 12(4), 290-298. https://doi.org/10.18178/ijiet.2022.12.4.1617 CR - Çavuşoğlu, D., Kıncal, R. Y., & Kartal, O. Y. (2023). Systematic review of research conducted on the techno-pedagogical content knowledge of English teachers. Journal of Family Counseling and Education, 8(2), 170-192. https://doi.org/10.32568/jfce.1269034 CR - Chang, D. F., & Berk, A. (2009). Making cross-racial therapy work: A phenomenological study of clients’ experiences of cross-racial therapy. Journal of Counseling Psychology, 56(4), 521-536. https://doi.org/10.1037/a0016905 CR - Cheddak, A. (2024). BERTopic for enhanced idea management and topic generation in brainstorming sessions. Information, 15(6), 365. https://doi.org/10.3390/info15060365 CR - Chowdhary, K. R. (2020). Natural language processing. Fundamentals of Artificial Intelligence, 603-649. https://doi.org/10.1007/978-81-322-3972-7_19 CR - Chwalisz, K., Wiersma, N., & Stark-Wroblewski, K. (1996). A quasi-qualitative investigation of strategies used in qualitative categorization. Journal of Counseling Psychology, 43(4), 502-509. https://doi.org/10.1037/0022-0167.43.4.502 CR - Cowan, T., Rodriguez, Z., Granrud, O., Masucci, M., Docherty, N., & Cohen, A. (2022). Talking about health: A topic analysis of narratives from individuals with schizophrenia and other serious mental illnesses. Behavioral Sciences, 12(8), 286. https://doi.org/10.3390/bs12080286 CR - Dinçer, P., & Yavuz, H. (2023). Behind the screen: a case study on the perspectives of freshman EFL students and their instructors. Education and Information Technologies, 28(9), 11881-11920. https://doi.org/10.1007/s10639-023-11661-4 CR - Ding, Q., Ding, D., Wang, Y., Guan, C., & Ding, B. (2023). Unraveling the landscape of large language models: A systematic review and future perspectives. Journal of Electronic Business & Digital Economics, 3, 3-19. https://doi.org/10.1108/jebde-08-2023-0015 CR - Egger, R., & Yu, J. (2022). A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Frontiers in Sociology, 7. https://doi.org/10.3389/fsoc.2022.886498 CR - Ekinci, E., & Omurca, S. (2019). Concept-LDA: Incorporating Babelfy into LDA for aspect extraction. Journal of Information Science, 46(3), 406-418. https://doi.org/10.1177/0165551519845854 CR - Foster, A. (2016). An extension of standard latent dirichlet allocation to multiple corpora. SIAM Undergraduate Research Online, 9. https://doi.org/10.1137/15s014599 CR - Foster, C., & Inglis, M. (2018). Mathematics teacher professional journals: What topics appear and how has this changed over time?. International Journal of Science and Mathematics Education, 17(8), 1627-1648. https://doi.org/10.1007/s10763-018-9937-4 CR - Grootendorst, M. (2022). BERTOPIC: Neural topic modeling with a class-based TF-IDF procedure. https://doi.org/10.48550/arxiv.2203.05794 CR - Hamelberg, K., de Ruyter, K., van Dolen, W., & Konuş, U. (2024). Finding the right voice: How CEO communication on the Russia–Ukraine war drives public engagement and digital activism. Journal of Public Policy & Marketing. https://doi.org/10.1177/07439156241230910 CR - Hujala, M., Knutas, A., Hynninen, T., & Arminen, H. (2020). Improving the quality of teaching by utilizing written student feedback: A streamlined process. Computers & Education, 157, 103965. https://doi.org/10.1016/j.compedu.2020.103965 CR - Im, Y., Park, J., Kim, M., & Park, K. (2019). Comparative study on perceived trust of topic modeling based on affective level of educational text. Applied Sciences, 9(21), 4565. https://doi.org/10.3390/app9214565 CR - Kiener, F., Gnehm, A., & Backes‐Gellner, U. (2023). Noncognitive skills in training curricula and nonlinear wage returns. International Journal of Manpower, 44(4), 772-788. https://doi.org/10.1108/ijm-03-2022-0119 CR - Kousis, A. (2023). Investigating the key aspects of a smart city through topic modeling and thematic analysis. Future Internet, 16(1), 3. https://doi.org/10.3390/fi16010003 CR - Kukushkin K., Ryabov Y., & Borovkov A. (2022). Digital Twins: A Systematic Literature Review Based on Data Analysis and Topic Modeling. Data, 7(12):173. https://doi.org/10.3390/data7120173 CR - Levitt, H. M., Bamberg, M., Creswell, J. W., Frost, D. M., Josselson, R., & Suárez‐Orozco, C. (2018). Journal article reporting standards for qualitative primary, qualitative meta-analytic, and mixed methods research in psychology: The APA publications and communications board task force report. American Psychologist, 73(1), 26-46. https://doi.org/10.1037/amp0000151 CR - Maryanto, M. (2024). Hybrid model for extractive single document summarization: Utilizing bertopic and bert model. IAES International Journal of Artificial Intelligence (Ij-Ai), 13(2), 1723. https://doi.org/10.11591/ijai.v13.i2.pp1723-1731 CR - McInnes, L., Healy, J. J., & Astels, S. (2017). HDBSCAN: Hierarchical density based clustering. The Journal of Open Source Software, 2(11), 205. https://doi.org/10.21105/joss.00205 CR - McInnes, L., Healy, J., Saul, N., & Grossberger, L. (2018). UMAP: Uniform manifold approximation and projection. The Journal of Open Source Software, 3(29), 861. CR - Mendonça, M. (2024). Topic extraction: BERTopic’s insight into the 117th congress’s twitterverse. Informatics, 11(1), 8. https://doi.org/10.3390/informatics11010008 CR - Mosia, M. (2024). Data-driven insights into non-purchasing behaviours through latent dirichlet allocation: Analysing study material acquisition among university students. Journal of Culture and Values in Education, 7(1), 72-82. https://doi.org/10.46303/jcve.2024.5 CR - Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of topic modelling approaches in the banking context. Applied Sciences, 13(2), 797. https://doi.org/10.3390/app13020797 CR - Özyurt, Ö. (2022). Empirical research of emerging trends and patterns across the flipped classroom studies using topic modeling. Education and Information Technologies, 28(4), 4335-4362. https://doi.org/10.1007/s10639-022-11396-8 CR - Pérez-Paredes, P., Guillamón, C. O., & Jiménez, P. A. (2018). Language teachers’ perceptions on the use of oer language processing technologies in mall. Computer Assisted Language Learning, 31(5-6), 522-545. https://doi.org/10.1080/09588221.2017.1418754 CR - Polkinghorne, D. E. (1994). Reaction to special section on qualitative research in counseling process and outcome.. Journal of Counseling Psychology, 41(4), 510-512. https://doi.org/10.1037//0022-0167.41.4.510 CR - Qiang, J., Chen, P., Wang, T., & Wu, X. (2017). Topic modeling over short texts by incorporating word embeddings. Advances in Knowledge Discovery and Data Mining, 363-374. https://doi.org/10.1007/978-3-319-57529-2_29 CR - Ramamoorthy, T., Kulothungan, V., & Mappillairaju, B. (2024). Topic modeling and social network analysis approach to explore diabetes discourse on twitter in India. Frontiers in Artificial Intelligence, 7. https://doi.org/10.3389/frai.2024.1329185 CR - Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Retrieved from http://arxiv.org/abs/1908.10084 CR - Reimers, N., & Gurevych, I. (2019). Sentencebert: Sentence embeddings using siamese BERTnetworks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Procesessing Association for Computational Linguistics. CR - Rossman, G., & Rallis, S. F. (2017). An introduction to qualitative research: Learning in the field. SAGE Publications. https://doi.org/10.4135/9781071802694 CR - Scarpino, I., Zucco, C., Vallelunga, R., Luzza, F., & Cannataro, M. (2022). Investigating topic modeling techniques to extract meaningful insights in italian long covid narration. Biotech, 11(3), 41. https://doi.org/10.3390/biotech11030041 CR - Shin, M., Ok, M. W., Choo, S., Hossain, G., Bryant, D. P., & Kang, E. (2023). A content analysis of research on technology use for teaching mathematics to students with disabilities: Word networks and topic modeling. International Journal of STEM Education, 10(1). https://doi.org/10.1186/s40594-023-00414-x CR - Soysal, Y., & Baltaru, R. (2021). University as the producer of knowledge, and economic and societal value: The 20th and twenty-first century transformations of the UK higher education system. European Journal of Higher Education, 11(3), 312-328. https://doi.org/10.1080/21568235.2021.1944250 CR - Sudigyo, D., Hidayat, A. A., Nirwantono, R., Rahutomo, R., Trinugroho, J. P., & Pardamean, B. (2023). Literature study of stunting supplementation in Indonesian utilizing text mining approach. Procedia Computer Science, 216, 722-729. https://doi.org/10.1016/j.procs.2022.12.189 CR - Sutton, J., & Austin, Z. (2015). Qualitative research: Data collection, analysis, and management. The Canadian Journal of Hospital Pharmacy, 68(3). https://doi.org/10.4212/cjhp.v68i3.1456 CR - Tufféry, S. (2022). Deep learning: From big data to artificial intelligence with r. John Wiley & Sons Ltd. https://doi.org/10.1002/9781119845041.ch9 CR - Wang, L., Chen, P., Chen, L., & Mou, J. (2021). Ship AIS trajectory clustering: An HDBSCAN-based approach. Journal of Marine Science and Engineering, 9(6), 566. https://doi.org/10.3390/jmse9060566 CR - Wang, Y., & Heppner, P. P. (2011). A qualitative study of childhood sexual abuse survivors in Taiwan: Toward a transactional and ecological model of coping. Journal of Counseling Psychology, 58(3), 393-409. https://doi.org/10.1037/a0023522 CR - Watanabe, G., Conching, A., Nishioka, S. T., Steed, T., Matsunaga, M., Lozanoff, S.,…& Noh, T. (2023). Themes in neuronavigation research: A machine learning topic analysis. World Neurosurgery: X, 18, 100182. https://doi.org/10.1016/j.wnsx.2023.100182 CR - Watanabe, K., & Baturo, A. (2024). Seeded Sequential LDA: A Semi-Supervised Algorithm for Topic-Specific Analysis of Sentences. Social Science Computer Review, 42(1), 224-248. https://doi.org/10.1177/08944393231178605 CR - Weisser, C., Gerloff, C., Thielmann, A., Python, A., Reuter, A., Kneib, T., … & Säfken, B. (2022). Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using twitter data. Computational Statistics, 38(2), 647-674. https://doi.org/10.1007/s00180-022-01246-z CR - Wildemann, S. (2023). Bridging qualitative data silos: The potential of reusing codings through machine learning based cross-study code linking. Social Science Computer Review, 42(3), 760-776. https://doi.org/10.1177/08944393231215459 CR - Wilson, J., Zhang, S., Palermo, C., Cordero, T. C., Zhang, F., Myers, M. C., … & Coles, J. (2024). A latent dirichlet allocation approach to understanding students’ perceptions of automated writing evaluation. Computers and Education Open, 6, 100194. https://doi.org/10.1016/j.caeo.2024.100194 CR - Yang, L., Shi, J., Zhao, C., & Zhang, C. (2023). Generalizing factors of covid-19 vaccine attitudes in different regions: A summary generation and topic modeling approach. Digital Health, 9. https://doi.org/10.1177/20552076231188852 CR - Yin, B., & Yuan, C. (2022). Detecting latent topics and trends in blended learning using LDA topic modeling. Education and Information Technologies, 27(9), 12689-12712. https://doi.org/10.1007/s10639-022-11118-0 CR - Zhang, D., Lee, K., & Lee, I. (2018). Hierarchical trajectory clustering for spatio-temporal periodic pattern mining. Expert Systems with Applications, 92, 1-11. https://doi.org/10.1016/j.eswa.2017.09.040 UR - https://doi.org/10.21031/epod.1539694 L1 - https://dergipark.org.tr/en/download/article-file/4173833 ER -