Modeling Education Studies Indexed in Web of Science Using Natural Language Processing

Tuncer Akbay

doi:10.52911/itall.1193460

TR EN

Web of Science Atıf İndeksinde Yer Alan Eğitim Araştırmalarının Doğal Dil İşleme Yöntemiyle Modellenmesi

Abstract

Gelişen teknolojiyle birlikte bilgi kaynaklarına erişim daha kolay hale geldi. Bu durum araştırmacıların kısa sürede daha fazla yayın yapmasına ve büyük birçoğunun elektronik olarak yayınlanmasına ve depolanmasına olanak sağladı. Akademik yayınların büyük bir kısmı Web of Science ve Scopus gibi bilimsel veri tabanlarında indekslenirler ve ilgili veri tabanlarından erişilirler. Bu veri tabanları binlerce hatta milyonlarca araştırma raporlarını depolar. Web of Science ve Scopus gibi indexler abonelik tabanlı erişim sağladıkları veri tabanlarından veri almak için arama motoru ve filtreleme seçenekleri sunsalar da içeriklerinin yakından ilişkili olduğu yayınları bulmak yine de zordur. Doğal dil işleme gibi yapay zekâ teknolojileri, belgelerin içeriklerine göre kategorilere ayrılmasını sağlar. Top2Vec, kullanıcıların dokümanları anlamsal olarak kategorize etmelerini sağlayan denetimsiz konu modelleme algoritmalarından biridir. Bu çalışmanın amacı iki yönlüdür: (1) Araştırmacılara doğal dil işleme tekniklerini uygulayarak içerikleri gruplama becerisi kazandırmak ve (2) 2021 yılında yayınlanmış olan ve Web of Science'da ‘Education Scientific Dsciplines’ (Eğitim Bilimsel Disiplinleri) kategorisinde indekslenen makalelerin içeriklerini gruplandırarak en çok yayın yapılan konuları tespit etmektir. Top2Vec algoritmasını çalıştırmak için yazılacak olan Pyhton kodları Google Colab Notebook kullanılmıştır. Bu çalışmada 2021 yılında yayınlanan ve Web of Science veri tabanında Eğitim Bilimsel Disiplinleri kategorisi altında indekslenen 8125 makale arasından 68 farklı konu tespit edilerek her bir konudaki makale sayıları ortaya konulmuştur. Modellenen konular en fazla yayın yapılmış (örn, makale) konudan (N=549) en az yayın yapılmış konuya (N=29) doğru sıralandıktan sonra ilk sekiz konunun içerdiği anahtar kelimeler raporlanmış ve tartışılmıştır. En çok araştırma yapılan bu sekiz konu şu şekilde listelenmiştir: Fizik eğitimi (N=549), Çevrimiçi Eğitim ve Kovid-19 (N=438), Kimya Eğitimi (N=381), Matematik Eğitimi ve Akıl Yürütme (N=377), Psikoloji ve Duygu Durumu (N=257 ), Eğitimde Kültürel Çeşitlilik (N=228), Sağlık ve Yaşam (N=223), Mentorluk ve Liderlik (N=204).

Keywords

Konu modelleme, Eğitim Araştırmaları, Makine öğrenmesi, Doğal dil işleme, Top2Vec algoritma, Topic modeling, Machine Learning, Education, NLP, Artificial intelligence, Top2Vec algorithm

Modeling Education Studies Indexed in Web of Science Using Natural Language Processing

Abstract

Easier access to information and resources allowed researchers to conduct more studies and publish most of them electronically. They are indexed in scholarly citation databases such as Web of Science and Scopus. These databases index huge volumes of research reports. Even though they offer search engine filtering options, it is still hard to locate the publications in which their contents are closely related. Artificial intelligence technologies, such as Natural Language Processing, allow documents to be categorized based on their content. Top2Vec is an unsupervised topic modeling algorithm that enables users to categorize documents semantically. The purpose of the current study is twofold: (1) to provide users with the ability to group documents applying Natural Language Processing techniques, and (2) to reveal the topics with the highest number of articles indexed in the ‘education scientific disciplines’ category within the Web of Science Core Collection scholarly database in 2021. Colab notebook used to type Python codes for executing Top2Vec algorithm. This study yielded 68 distinct topics among the 8125 articles published in 2021 and indexed in the Web of Science database under the Education Scientific Disciplines category. After modeled topics were ranked from the topic having the largest number of documents (i.e., N=549) to the topic having the least number of documents (i.e., N=29), the first eight topics' findings were presented and discussed. These eight most studies topics are listed as follows: Physics (N=549), online education and covid (N=438), Chemistry (N=381), Math and Reasoning (N=377), Psychology and Emotions (N=257), Educational Diversity (N=228), Health and Life (N=223), Mentoring and Leadership (N=204).

Keywords

Topic modeling, Machine Learning, Education, NLP, Artificial intelligence, Top2Vec algorithm

References

Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. Retrieved from https://arxiv.org/abs/2008.09470
Anuradha, C., & Velmurugan, T. (2015). A comparative analysis on the evaluation of classification algorithms in the prediction of student’s performance. Indian Journal of Science and Technology, 8(15), 1-12.
Bohr, J.; Dunlap, R.E. (2018). Key topics in environmental sociology, 1990–2014: Results from a computational text analysis. Environmental Sociology, 4, 181–195.
Chang, I. C., Yu, T. K., Chang, Y. J., & Yu, T. Y. (2021). Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals. Sustainability, 13(19), 10856.
Chen, Y., Yu, B., Zhang, X., & Yu, Y. (2016, April). Topic modeling for evaluating students' reflective writing: a case study of pre-service teachers' journals. In Proceedings of The Sixth International Conference on Learning Analytics & Knowledge (pp. 1-5).
Egger, R., and Yu, J. (2022). A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Frontiers Sociology. 7, 886498. doi: 10.3389/fsoc.2022.886498
Eykens, J., Guns, R., & Vanderstraeten, R. (2022). Subject specialties as interdisciplinary trading grounds: The case of the social sciences and humanities. Scientometrics, 1-21.
Gunawan, T. S., Ashraf, A., Riza, B. S., Haryanto, E. V., Rosnelly, R., Kartiwi, M., & Janin, Z. (2020). Development of video-based emotion recognition using deep learning with Google Colab. TELKOMNIKA (Telecommunication Computing Electronics and Control), 18(5), 2463-2471.
Hirschberg, J.; Manning, C.D. Advances in natural language processing. Science 2015, 349, 261–266.
Hung, J. L. (2012). Trends of e‐learning research from 2000 to 2008: Use of text mining and bibliometrics. British Journal of Educational Technology, 43(1), 5-16.

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169-15211.
Karas, B., Qu, S., Xu, Y., & Zhu, Q. (2022). Experiments with LDA and Top2Vec for embedded topic discovery on social media data—A case study of cystic fibrosis. Frontiers in Artificial Intelligence, 5.
Linguamatics (2022). What is Text Mining, Text Analytics and Natural Language Processing? Retrieved (18.10.2022) from https://www.linguamatics.com/what-text-mining-text-analytics-and-natural-language-processing
Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR), 9, 381-386.
Mythili, M. S., & Shanavas, A. M. (2014). An Analysis of students’ performance using classification algorithms. IOSR Journal of Computer Engineering, 16(1), 63-69.
Sevli, O., & Kemaloğlu, N. (2021). Olağandışı Olaylar Hakkındaki Tweet’lerin Gerçek ve Gerçek Dışı Olarak Google BERT Modeli ile Sınıflandırılması. Veri Bilimi, 4(1), 31-37.
Zawacki-Richter, O., & Latchem, C. (2018). Exploring four decades of research in Computers & Education. Computers & Education, 122, 136-152.
Zawacki-Richter, O.; Naidu, S. (2016). Mapping research trends from 35 years of publications in Distance Education. Distance Education, 37, 245–269.

Details

Primary Language

English

Subjects

Other Fields of Education

Journal Section

Research Article

Authors

Tuncer Akbay ^*
0000-0003-3938-1026
Türkiye

Publication Date

December 31, 2022

Submission Date

October 23, 2022

Acceptance Date

December 6, 2022

Published in Issue

Year 2022 Volume: 3 Number: 2

DOI

https://doi.org/10.52911/itall.1193460

IZ

https://izlik.org/JA45XR85AP

APA

Akbay, T. (2022). Modeling Education Studies Indexed in Web of Science Using Natural Language Processing. Instructional Technology and Lifelong Learning, 3(2), 129-143. https://doi.org/10.52911/itall.1193460

AMA

1.Akbay T. Modeling Education Studies Indexed in Web of Science Using Natural Language Processing. ITALL. 2022;3(2):129-143. doi:10.52911/itall.1193460

Chicago

Akbay, Tuncer. 2022. “Modeling Education Studies Indexed in Web of Science Using Natural Language Processing”. Instructional Technology and Lifelong Learning 3 (2): 129-43. https://doi.org/10.52911/itall.1193460.

EndNote

Akbay T (December 1, 2022) Modeling Education Studies Indexed in Web of Science Using Natural Language Processing. Instructional Technology and Lifelong Learning 3 2 129–143.

IEEE

[1]T. Akbay, “Modeling Education Studies Indexed in Web of Science Using Natural Language Processing”, ITALL, vol. 3, no. 2, pp. 129–143, Dec. 2022, doi: 10.52911/itall.1193460.

ISNAD

Akbay, Tuncer. “Modeling Education Studies Indexed in Web of Science Using Natural Language Processing”. Instructional Technology and Lifelong Learning 3/2 (December 1, 2022): 129-143. https://doi.org/10.52911/itall.1193460.

JAMA

1.Akbay T. Modeling Education Studies Indexed in Web of Science Using Natural Language Processing. ITALL. 2022;3:129–143.

MLA

Akbay, Tuncer. “Modeling Education Studies Indexed in Web of Science Using Natural Language Processing”. Instructional Technology and Lifelong Learning, vol. 3, no. 2, Dec. 2022, pp. 129-43, doi:10.52911/itall.1193460.

Vancouver

1.Tuncer Akbay. Modeling Education Studies Indexed in Web of Science Using Natural Language Processing. ITALL. 2022 Dec. 1;3(2):129-43. doi:10.52911/itall.1193460

Cited By

INCEPTION SH: A NEW CNN MODEL BASED ON INCEPTION MODULE FOR CLASSIFYING SCENE IMAGES

Mühendislik Bilimleri ve Tasarım Dergisi

https://doi.org/10.21923/jesd.1372788

This work is licensed under a Creative Commons Attribution 4.0 International License.