Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection
Öz
Developers are key to managing, storing and analysing the growing biological data. Platforms like Stack Overflow help identify current trends in the field. In this study, we present an analysis of the posts shared on the Stack Overflow website within the field of bioinformatics. We analyzed the posts shared about bioinformatics on the Stack Overflow platform using LDA topic modeling and the Louvain community finding algorithm. Our finding revealed that bioinformatics developers’ questions focused on 28 topics in four main categories. We found that the most popular topics were “Gene Expression and Function”, “Protein Interaction Prediction”, “Gene and Protein Structure Analysis”, “Sample Analysis in Network Problems”, and “Genomic Data Management”. Besides, we also presented that topics in bioinformatics consist of seven communities and the trends of these communities and the relationship between the 100 most central words. Our finding also revealed that the topics that code developers are most interested in in the field of bioinformatics are “next generation sequencing”, “genome”, “gene”, “phylogeny”, “proteins”, and “sequence”. Based on the results we obtained from this study, the problems that bioinformatics developers have encountered over time have been revealed with topic modeling and community detection.
Anahtar Kelimeler
Kaynakça
- [1] Ramsden J., “Bioinformatics: An Introduction”, Springer Nature, fourth ed., Switzerland, (2023).
- [2] Rastogi S.C., Rastogi P., Mendiratta N., “Bioinformatics: Methods and Applications-Genomics, Proteomics and Drug Discovery”, PHI Learning Pvt. Ltd., fifth ed., Delhi, (2022).
- [3] Satam H., Joshi K., Mangrolia U., Waghoo S., Zaidi G., Rawool S., Thakare R.P., Banday S., Mishra A.K., Das G., Malonia S.K., “Next-generation sequencing technology: current trends and advancements”, Biology, 12(7);997, (2023).
- [4] Kitsou K., Katzourakis A., Magiorkinis G., “Limitations of current high-throughput sequencing technologies lead to biased expression estimates of endogenous retroviral elements”, NAR Genomics and Bioinformatics, 6(3), (2024).
- [5] Lesk A., “Introduction to Bioinformatics”, Oxford University Press, fifth ed., United Kingdom, (2019).
- [6] Hie B., Peters J., Nyquist S.K., Shalek A.K., Berger B., Bryson B.D., “Computational methods for single-cell RNA sequencing”, Annual Review of Biomedical Data Science, 3(1);339-364, (2020).
- [7] Topol E., “Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again”, Basic Books, United States, (2019).
- [8] Tolani P., Gupta S., Yadav K., Aggarwal S., Yadav A.K., “Big data, integrative omics and network biology”, Advances in Protein Chemistry and Structural Biology, 127;127-160, (2021).
Ayrıntılar
Birincil Dil
İngilizce
Konular
Yarı ve Denetimsiz Öğrenme, Makine Öğrenme (Diğer), Doğal Dil İşleme
Bölüm
Araştırma Makalesi
Yazarlar
Erken Görünüm Tarihi
3 Mayıs 2025
Yayımlanma Tarihi
3 Mart 2026
Gönderilme Tarihi
19 Kasım 2024
Kabul Tarihi
17 Nisan 2025
Yayımlandığı Sayı
Yıl 2026 Cilt: 29 Sayı: 1