Modeling Education Studies Indexed in Web of Science Using Natural Language Processing

Tuncer Akbay

doi:10.52911/itall.1193460

Research Article

Web of Science Atıf İndeksinde Yer Alan Eğitim Araştırmalarının Doğal Dil İşleme Yöntemiyle Modellenmesi

Year 2022, Volume: 3 Issue: 2, 129 - 143, 31.12.2022

Tuncer Akbay

https://doi.org/10.52911/itall.1193460

Cited By: 1

Abstract

Gelişen teknolojiyle birlikte bilgi kaynaklarına erişim daha kolay hale geldi. Bu durum araştırmacıların kısa sürede daha fazla yayın yapmasına ve büyük birçoğunun elektronik olarak yayınlanmasına ve depolanmasına olanak sağladı. Akademik yayınların büyük bir kısmı Web of Science ve Scopus gibi bilimsel veri tabanlarında indekslenirler ve ilgili veri tabanlarından erişilirler. Bu veri tabanları binlerce hatta milyonlarca araştırma raporlarını depolar. Web of Science ve Scopus gibi indexler abonelik tabanlı erişim sağladıkları veri tabanlarından veri almak için arama motoru ve filtreleme seçenekleri sunsalar da içeriklerinin yakından ilişkili olduğu yayınları bulmak yine de zordur. Doğal dil işleme gibi yapay zekâ teknolojileri, belgelerin içeriklerine göre kategorilere ayrılmasını sağlar. Top2Vec, kullanıcıların dokümanları anlamsal olarak kategorize etmelerini sağlayan denetimsiz konu modelleme algoritmalarından biridir. Bu çalışmanın amacı iki yönlüdür: (1) Araştırmacılara doğal dil işleme tekniklerini uygulayarak içerikleri gruplama becerisi kazandırmak ve (2) 2021 yılında yayınlanmış olan ve Web of Science'da ‘Education Scientific Dsciplines’ (Eğitim Bilimsel Disiplinleri) kategorisinde indekslenen makalelerin içeriklerini gruplandırarak en çok yayın yapılan konuları tespit etmektir. Top2Vec algoritmasını çalıştırmak için yazılacak olan Pyhton kodları Google Colab Notebook kullanılmıştır. Bu çalışmada 2021 yılında yayınlanan ve Web of Science veri tabanında Eğitim Bilimsel Disiplinleri kategorisi altında indekslenen 8125 makale arasından 68 farklı konu tespit edilerek her bir konudaki makale sayıları ortaya konulmuştur. Modellenen konular en fazla yayın yapılmış (örn, makale) konudan (N=549) en az yayın yapılmış konuya (N=29) doğru sıralandıktan sonra ilk sekiz konunun içerdiği anahtar kelimeler raporlanmış ve tartışılmıştır. En çok araştırma yapılan bu sekiz konu şu şekilde listelenmiştir: Fizik eğitimi (N=549), Çevrimiçi Eğitim ve Kovid-19 (N=438), Kimya Eğitimi (N=381), Matematik Eğitimi ve Akıl Yürütme (N=377), Psikoloji ve Duygu Durumu (N=257 ), Eğitimde Kültürel Çeşitlilik (N=228), Sağlık ve Yaşam (N=223), Mentorluk ve Liderlik (N=204).

Keywords

Konu modelleme, Eğitim Araştırmaları, Makine öğrenmesi, Doğal dil işleme, Top2Vec algoritma, Topic modeling, Machine Learning, Education, NLP, Artificial intelligence, Top2Vec algorithm

References

Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. Retrieved from https://arxiv.org/abs/2008.09470
Anuradha, C., & Velmurugan, T. (2015). A comparative analysis on the evaluation of classification algorithms in the prediction of student’s performance. Indian Journal of Science and Technology, 8(15), 1-12.
Bohr, J.; Dunlap, R.E. (2018). Key topics in environmental sociology, 1990–2014: Results from a computational text analysis. Environmental Sociology, 4, 181–195.
Chang, I. C., Yu, T. K., Chang, Y. J., & Yu, T. Y. (2021). Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals. Sustainability, 13(19), 10856.
Chen, Y., Yu, B., Zhang, X., & Yu, Y. (2016, April). Topic modeling for evaluating students' reflective writing: a case study of pre-service teachers' journals. In Proceedings of The Sixth International Conference on Learning Analytics & Knowledge (pp. 1-5).
Egger, R., and Yu, J. (2022). A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Frontiers Sociology. 7, 886498. doi: 10.3389/fsoc.2022.886498
Eykens, J., Guns, R., & Vanderstraeten, R. (2022). Subject specialties as interdisciplinary trading grounds: The case of the social sciences and humanities. Scientometrics, 1-21.
Gunawan, T. S., Ashraf, A., Riza, B. S., Haryanto, E. V., Rosnelly, R., Kartiwi, M., & Janin, Z. (2020). Development of video-based emotion recognition using deep learning with Google Colab. TELKOMNIKA (Telecommunication Computing Electronics and Control), 18(5), 2463-2471.
Hirschberg, J.; Manning, C.D. Advances in natural language processing. Science 2015, 349, 261–266.
Hung, J. L. (2012). Trends of e‐learning research from 2000 to 2008: Use of text mining and bibliometrics. British Journal of Educational Technology, 43(1), 5-16.
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169-15211.
Karas, B., Qu, S., Xu, Y., & Zhu, Q. (2022). Experiments with LDA and Top2Vec for embedded topic discovery on social media data—A case study of cystic fibrosis. Frontiers in Artificial Intelligence, 5.
Linguamatics (2022). What is Text Mining, Text Analytics and Natural Language Processing? Retrieved (18.10.2022) from https://www.linguamatics.com/what-text-mining-text-analytics-and-natural-language-processing
Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR), 9, 381-386.
Mythili, M. S., & Shanavas, A. M. (2014). An Analysis of students’ performance using classification algorithms. IOSR Journal of Computer Engineering, 16(1), 63-69.
Sevli, O., & Kemaloğlu, N. (2021). Olağandışı Olaylar Hakkındaki Tweet’lerin Gerçek ve Gerçek Dışı Olarak Google BERT Modeli ile Sınıflandırılması. Veri Bilimi, 4(1), 31-37.
Zawacki-Richter, O., & Latchem, C. (2018). Exploring four decades of research in Computers & Education. Computers & Education, 122, 136-152.
Zawacki-Richter, O.; Naidu, S. (2016). Mapping research trends from 35 years of publications in Distance Education. Distance Education, 37, 245–269.

Modeling Education Studies Indexed in Web of Science Using Natural Language Processing

Year 2022, Volume: 3 Issue: 2, 129 - 143, 31.12.2022

Tuncer Akbay

https://doi.org/10.52911/itall.1193460

Cited By: 1

Abstract

Easier access to information and resources allowed researchers to conduct more studies and publish most of them electronically. They are indexed in scholarly citation databases such as Web of Science and Scopus. These databases index huge volumes of research reports. Even though they offer search engine filtering options, it is still hard to locate the publications in which their contents are closely related. Artificial intelligence technologies, such as Natural Language Processing, allow documents to be categorized based on their content. Top2Vec is an unsupervised topic modeling algorithm that enables users to categorize documents semantically. The purpose of the current study is twofold: (1) to provide users with the ability to group documents applying Natural Language Processing techniques, and (2) to reveal the topics with the highest number of articles indexed in the ‘education scientific disciplines’ category within the Web of Science Core Collection scholarly database in 2021. Colab notebook used to type Python codes for executing Top2Vec algorithm. This study yielded 68 distinct topics among the 8125 articles published in 2021 and indexed in the Web of Science database under the Education Scientific Disciplines category. After modeled topics were ranked from the topic having the largest number of documents (i.e., N=549) to the topic having the least number of documents (i.e., N=29), the first eight topics' findings were presented and discussed. These eight most studies topics are listed as follows: Physics (N=549), online education and covid (N=438), Chemistry (N=381), Math and Reasoning (N=377), Psychology and Emotions (N=257), Educational Diversity (N=228), Health and Life (N=223), Mentoring and Leadership (N=204).

Keywords

Topic modeling, Machine Learning, Education, NLP, Artificial intelligence, Top2Vec algorithm

References

Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. Retrieved from https://arxiv.org/abs/2008.09470
Anuradha, C., & Velmurugan, T. (2015). A comparative analysis on the evaluation of classification algorithms in the prediction of student’s performance. Indian Journal of Science and Technology, 8(15), 1-12.
Bohr, J.; Dunlap, R.E. (2018). Key topics in environmental sociology, 1990–2014: Results from a computational text analysis. Environmental Sociology, 4, 181–195.
Chang, I. C., Yu, T. K., Chang, Y. J., & Yu, T. Y. (2021). Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals. Sustainability, 13(19), 10856.
Chen, Y., Yu, B., Zhang, X., & Yu, Y. (2016, April). Topic modeling for evaluating students' reflective writing: a case study of pre-service teachers' journals. In Proceedings of The Sixth International Conference on Learning Analytics & Knowledge (pp. 1-5).
Egger, R., and Yu, J. (2022). A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Frontiers Sociology. 7, 886498. doi: 10.3389/fsoc.2022.886498
Eykens, J., Guns, R., & Vanderstraeten, R. (2022). Subject specialties as interdisciplinary trading grounds: The case of the social sciences and humanities. Scientometrics, 1-21.
Gunawan, T. S., Ashraf, A., Riza, B. S., Haryanto, E. V., Rosnelly, R., Kartiwi, M., & Janin, Z. (2020). Development of video-based emotion recognition using deep learning with Google Colab. TELKOMNIKA (Telecommunication Computing Electronics and Control), 18(5), 2463-2471.
Hirschberg, J.; Manning, C.D. Advances in natural language processing. Science 2015, 349, 261–266.
Hung, J. L. (2012). Trends of e‐learning research from 2000 to 2008: Use of text mining and bibliometrics. British Journal of Educational Technology, 43(1), 5-16.
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169-15211.
Karas, B., Qu, S., Xu, Y., & Zhu, Q. (2022). Experiments with LDA and Top2Vec for embedded topic discovery on social media data—A case study of cystic fibrosis. Frontiers in Artificial Intelligence, 5.
Linguamatics (2022). What is Text Mining, Text Analytics and Natural Language Processing? Retrieved (18.10.2022) from https://www.linguamatics.com/what-text-mining-text-analytics-and-natural-language-processing
Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR), 9, 381-386.
Mythili, M. S., & Shanavas, A. M. (2014). An Analysis of students’ performance using classification algorithms. IOSR Journal of Computer Engineering, 16(1), 63-69.
Sevli, O., & Kemaloğlu, N. (2021). Olağandışı Olaylar Hakkındaki Tweet’lerin Gerçek ve Gerçek Dışı Olarak Google BERT Modeli ile Sınıflandırılması. Veri Bilimi, 4(1), 31-37.
Zawacki-Richter, O., & Latchem, C. (2018). Exploring four decades of research in Computers & Education. Computers & Education, 122, 136-152.
Zawacki-Richter, O.; Naidu, S. (2016). Mapping research trends from 35 years of publications in Distance Education. Distance Education, 37, 245–269.

There are 18 citations in total.

Details

Primary Language	English
Subjects	Other Fields of Education
Journal Section	Research Articles
Authors	Tuncer Akbay 0000-0003-3938-1026
Early Pub Date	October 24, 2022
Publication Date	December 31, 2022
Submission Date	October 23, 2022
Acceptance Date	December 6, 2022
Published in Issue	Year 2022 Volume: 3 Issue: 2

Cite

APA	Akbay, T. (2022). Modeling Education Studies Indexed in Web of Science Using Natural Language Processing. Instructional Technology and Lifelong Learning, 3(2), 129-143. https://doi.org/10.52911/itall.1193460

Instructional Technology and Lifelong Learning

Web of Science Atıf İndeksinde Yer Alan Eğitim Araştırmalarının Doğal Dil İşleme Yöntemiyle Modellenmesi

Abstract

Keywords

References

Modeling Education Studies Indexed in Web of Science Using Natural Language Processing

Abstract

Keywords

References

Details

Cite

Cited By

INCEPTION SH: A NEW CNN MODEL BASED ON INCEPTION MODULE FOR CLASSIFYING SCENE IMAGES

Mühendislik Bilimleri ve Tasarım Dergisi

https://doi.org/10.21923/jesd.1372788