Modeling Education Studies Indexed in Web of Science Using Natural Language Processing
Abstract
Easier access to information and resources allowed researchers to conduct more studies and publish most of them electronically. They are indexed in scholarly citation databases such as Web of Science and Scopus. These databases index huge volumes of research reports. Even though they offer search engine filtering options, it is still hard to locate the publications in which their contents are closely related. Artificial intelligence technologies, such as Natural Language Processing, allow documents to be categorized based on their content. Top2Vec is an unsupervised topic modeling algorithm that enables users to categorize documents semantically. The purpose of the current study is twofold: (1) to provide users with the ability to group documents applying Natural Language Processing techniques, and (2) to reveal the topics with the highest number of articles indexed in the ‘education scientific disciplines’ category within the Web of Science Core Collection scholarly database in 2021. Colab notebook used to type Python codes for executing Top2Vec algorithm. This study yielded 68 distinct topics among the 8125 articles published in 2021 and indexed in the Web of Science database under the Education Scientific Disciplines category. After modeled topics were ranked from the topic having the largest number of documents (i.e., N=549) to the topic having the least number of documents (i.e., N=29), the first eight topics' findings were presented and discussed. These eight most studies topics are listed as follows: Physics (N=549), online education and covid (N=438), Chemistry (N=381), Math and Reasoning (N=377), Psychology and Emotions (N=257), Educational Diversity (N=228), Health and Life (N=223), Mentoring and Leadership (N=204).
Keywords
Topic modeling , Machine Learning , Education , NLP , Artificial intelligence , Top2Vec algorithm
References
- Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. Retrieved from https://arxiv.org/abs/2008.09470
- Anuradha, C., & Velmurugan, T. (2015). A comparative analysis on the evaluation of classification algorithms in the prediction of student’s performance. Indian Journal of Science and Technology, 8(15), 1-12.
- Bohr, J.; Dunlap, R.E. (2018). Key topics in environmental sociology, 1990–2014: Results from a computational text analysis. Environmental Sociology, 4, 181–195.
- Chang, I. C., Yu, T. K., Chang, Y. J., & Yu, T. Y. (2021). Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals. Sustainability, 13(19), 10856.
- Chen, Y., Yu, B., Zhang, X., & Yu, Y. (2016, April). Topic modeling for evaluating students' reflective writing: a case study of pre-service teachers' journals. In Proceedings of The Sixth International Conference on Learning Analytics & Knowledge (pp. 1-5).
- Egger, R., and Yu, J. (2022). A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Frontiers Sociology. 7, 886498. doi: 10.3389/fsoc.2022.886498
- Eykens, J., Guns, R., & Vanderstraeten, R. (2022). Subject specialties as interdisciplinary trading grounds: The case of the social sciences and humanities. Scientometrics, 1-21.
- Gunawan, T. S., Ashraf, A., Riza, B. S., Haryanto, E. V., Rosnelly, R., Kartiwi, M., & Janin, Z. (2020). Development of video-based emotion recognition using deep learning with Google Colab. TELKOMNIKA (Telecommunication Computing Electronics and Control), 18(5), 2463-2471.
- Hirschberg, J.; Manning, C.D. Advances in natural language processing. Science 2015, 349, 261–266.
- Hung, J. L. (2012). Trends of e‐learning research from 2000 to 2008: Use of text mining and bibliometrics. British Journal of Educational Technology, 43(1), 5-16.
