Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection

Gülbahar Merve Şilbir

doi:10.2339/politeknik.1587995

Research Article

Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection

Year 2025, EARLY VIEW, 1 - 1

Gülbahar Merve Şilbir

https://doi.org/10.2339/politeknik.1587995

Abstract

Developers are key to managing, storing and analysing the growing biological data. Platforms like Stack Overflow help identify current trends in the field. In this study, we present an analysis of the posts shared on the Stack Overflow website within the field of bioinformatics. We analyzed the posts shared about bioinformatics on the Stack Overflow platform using LDA topic modeling and the Louvain community finding algorithm. Our finding revealed that bioinformatics developers’ questions focused on 28 topics in four main categories. We found that the most popular topics were “Gene Expression and Function”, “Protein Interaction Prediction”, “Gene and Protein Structure Analysis”, “Sample Analysis in Network Problems”, and “Genomic Data Management”. Besides, we also presented that topics in bioinformatics consist of seven communities and the trends of these communities and the relationship between the 100 most central words. Our finding also revealed that the topics that code developers are most interested in in the field of bioinformatics are “next generation sequencing”, “genome”, “gene”, “phylogeny”, “proteins”, and “sequence”. Based on the results we obtained from this study, the problems that bioinformatics developers have encountered over time have been revealed with topic modeling and community detection.

Keywords

Bioinformatics, Bioinformatics Topics, Topic Modeling, Community Detection, Stack Overflow

References

[1] Ramsden J., “Bioinformatics: An Introduction”, Springer Nature, fourth ed., Switzerland, (2023).
[2] Rastogi S.C., Rastogi P., Mendiratta N., “Bioinformatics: Methods and Applications-Genomics, Proteomics and Drug Discovery”, PHI Learning Pvt. Ltd., fifth ed., Delhi, (2022).
[3] Satam H., Joshi K., Mangrolia U., Waghoo S., Zaidi G., Rawool S., Thakare R.P., Banday S., Mishra A.K., Das G., Malonia S.K., “Next-generation sequencing technology: current trends and advancements”, Biology, 12(7);997, (2023).
[4] Kitsou K., Katzourakis A., Magiorkinis G., “Limitations of current high-throughput sequencing technologies lead to biased expression estimates of endogenous retroviral elements”, NAR Genomics and Bioinformatics, 6(3), (2024).
[5] Lesk A., “Introduction to Bioinformatics”, Oxford University Press, fifth ed., United Kingdom, (2019).
[6] Hie B., Peters J., Nyquist S.K., Shalek A.K., Berger B., Bryson B.D., “Computational methods for single-cell RNA sequencing”, Annual Review of Biomedical Data Science, 3(1);339-364, (2020).
[7] Topol E., “Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again”, Basic Books, United States, (2019).
[8] Tolani P., Gupta S., Yadav K., Aggarwal S., Yadav A.K., “Big data, integrative omics and network biology”, Advances in Protein Chemistry and Structural Biology, 127;127-160, (2021).
[9] Kashyap H., Ahmed H.A., Hoque N., Roy S., Bhattacharyya D.K., “Big data analytics in bioinformatics: architectures, techniques, tools and issues”, Network Modeling Analysis in Health Informatics and Bioinformatics, 5;1-28, (2016).
[10] Chen C., Wu Y., Li J., Wang X., Zeng Z., Xu J., Liu Y., Feng J., Chen H., He Y., Xia R., “TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining”, Molecular Plant, 16(11);1733-1742, (2023).
[11] Ahmed S.S., Wang S., Tian Y., Chen T.H.P., Zhang H., “Studying and recommending information highlighting in Stack Overflow answers”, Information and Software Technology, 172;107478, (2024).
[12] Gürcan F., Özyurt Ö., “Identification of trend topics discussed in Stackoverflow posts by word frequency analysis”, GUSTIJ, 11(2); 357-368, (2021).
[13] Blei D.M., Ng A.Y., Jordan M.I., “Latent dirichlet allocation”, Journal of Machine Learning Research, 3; 993-1022, (2003).
[14] Alghamdi R., Alfalqi K., “A survey of topic modeling in text mining”, Int. J. Adv. Comput. Sci. Appl., 6(1), (2015).
[15] Hahn A., Mohanty S.D., Manda P., “What’s hot and what’s not? Exploring trends in bioinformatics literature using topic modeling and keyword analysis”, in: Proceedings of ISBRA, pp. 279-290, (2017).
[16] Fortunato S., Hric D., “Community detection in networks: A user guide”, Physics Reports, 659; 1-44, (2016).
[17] Rossetti G., Pappalardo L., Rinzivillo S., “A novel approach to evaluate community detection algorithms on ground truth”, in: Proceedings of Complex Networks VII, pp. 133-144, (2016).
[18] Blondel V.D., Guillaume J.L., Lambiotte R., Lefebvre E., “Fast unfolding of communities in large networks”, J. Stat. Mech., P10008, (2008).
[19] Malliaros F.D., Vazirgiannis M., “Clustering and community detection in directed networks: A survey” Physics Reports, 533(4); 95-142, (2013)
[20] Su X., Xue S., Liu F., Wu J., Yang J., Zhou C., Hu W., Paris C., Nepal S., Jin D., Sheng Q.Z., Yu P.S., “A comprehensive survey on community detection with deep learning”, IEEE Transactions on Neural Networks and Learning Systems, 35(4); 4682-4702, (2024).
[21] Youssef A., Rich A., “Exploring trends and themes in bioinformatics literature using topic modeling and temporal analysis”, in: 2018 IEEE Long Island Systems, Applications and Technology Conference, pp. 1-6, (2018).
[22] Ebrahimi F., Dehghani M., Makkizadeh F., “Analysis of persian bioinformatics research with topic modeling”, BioMed Research International, 3728131, (2023).
[23] Papadimitriou C.H., Tamaki H., Raghavan P., Vempala S., “Latent semantic indexing: A probabilistic analysis”, in: Proceedings of ACM Sigact-Sigmod-Sigart, pp. 159-168, (1998).
[24] Blei D.M., “Probabilistic topic models”, Communications of the ACM, 55(4);77-84, (2012).
[25] Sayed A.H., “In Inference and Learning from Data: Inference”, Cambridge University Press, United Kingdom, (2023).
[26] Röder M., Both A., Hinneburg A., “Exploring the space of topic coherence measures”, in: Proceedings of WSDM, pp. 399-408, (2015).
[27] Ma T., Liu Q., Cao J., Tian Y., Al-Dhelaan A., Al-Rodhaan M., “LGIEM: Global and local node influence based community detection”, Future Generation Computer Systems, 105;533-546, (2020).
[28] Orman G., Labatut V., Cherifi H., “Qualitative comparison of community detection algorithms”, in: Proceedings of DICTAP, pp. 265-279, (2011).
[29] Salha-Galvan G., Lutzeyer J.F., Dasoulas G., Hennequin R., Vazirgiannis M., “Modularity-aware graph autoencoders for joint community detection and link prediction”, Neural Networks, 153;474-495, (2022).
[30] Langmead B., Salzberg S.L., “Fast gapped-read alignment with Bowtie 2”, Nature Methods, 9(4);357-359, (2012).
[31] Libbrecht M.W., Noble W.S., Machine learning applications in genetics and genomics”, Nature Reviews Genetics, 16(6);321-332, (2015).
[32] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Bruncher M., Perrot M., Duchesnay E., “Scikit-learn: Machine learning in Python”, Journal of Machine Learning Research, 12;2825-2830, (2011).
[33] Cibulskis K., Lawrence M.S., Carter S.L., Sivachenko A., Jaffe D., Sougnez C., Gabriel S., Meyerson M., Lander E.S., Getz G., “Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples”, Nature Biotechnology, 31(3); 213-219, (2013).
[34] Friedman A.A., Letai A., Fisher D.E., Flaherty K.T., “Precision medicine for cancer with next-generation functional diagnostics”, Nature Reviews Cancer, 15(12);747-756, (2015).
[35] Van Dijk D., Sharma R., Nainys J., Yim K., Kathail P., Carr A.J., Burdziak C., Moon K.R., Chaffer C.L., Pattabiraman D., Bierie B., Mazutis L., Wolf G., Krishnaswamy S., Pe'er D., “Recovering gene interactions from single-cell data using data diffusion”, Cell, 174(3);716-729, (2018).
[36] Gurcan F., Çağıltay N.E., “Exploratory analysis of topic interests and their evolution in bioinformatics research using semantic text mining and probabilistic topic modeling”, IEEE Access, 31480-31493, (2022).
[37] Qian X.B., Chen T., Xu Y.P., Chen L., Sun F.X., Lu M.P., Liu Y.X., “A guide to human microbiome research: study design, sample collection, and bioinformatics analysis”, Chinese Medical Journal, 133(15);1844-1855, (2020).
[38] Pereira R., Oliveira J., Sousa M., “Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics”, Journal of Clinical Medicine, 9(1);132, (2020).
[39] Auslander N., Gussow A.B., Koonin E.V., “Incorporating machine learning into established bioinformatics frameworks”, International Journal of Molecular Sciences, 22(6);2903, (2021).
[40] Shi J., “Machine learning and bioinformatics approaches for classification and clinical detection of bevacizumab responsive glioblastoma subtypes based on miRNA expression”, Scientific Reports, 12(1);8685, (2022).
[41] Wang L., Deng C., Wu Z., Zhu K., Yang Z., “Bioinformatics and machine learning were used to validate glutamine metabolism-related genes and immunotherapy in osteoporosis patients”, Journal of Orthopaedic Surgery and Research, 18(1); 685, (2023).
[42] Barun M.N., Önder E., “Unlocking the multidisciplinary potential of data science: Insights from apriori analysis”, Politeknik Dergisi, 1-1, (2024).
[43] Akalın F., Yumuşak N., “Classification of exon and ıntron regions on dna sequences with hybrid use of SBERT and ANFIS approaches”, Politeknik Dergisi, 27(3), 1043-1053, (2024).
[44] Tokdemir G., “Using text mining for research trends in empirical software engineering”, Politeknik Dergisi, 24(3), 1227-1235, (2021).

Biyoenformatik Alanındaki Kod Geliştiricilerin Stack Overflow Platformunda Paylaştıkları Soruların Değerlendirilmesi: Konu Modelleme ve Topluluk Tespiti

Year 2025, EARLY VIEW, 1 - 1

Gülbahar Merve Şilbir

https://doi.org/10.2339/politeknik.1587995

Abstract

Kod geliştiriciler, artan biyolojik verileri yönetmek, depolamak ve analiz etmek için anahtar konumdadır. Stack Overflow gibi platformlar, geliştiriciler için alandaki mevcut eğilimleri belirlemeye yardımcı olan tartışma platformlarıdır. Bu çalışmada, biyoenformatik alanında Stack Overflow web sitesinde paylaşılan gönderilerin bir analizini sunuyoruz. LDA konu modellemesi ve Louvain topluluk bulma algoritmasını kullanarak Stack Overflow platformunda biyoenformatik hakkında paylaşılan gönderileri analiz ettik. Bulgularımız, biyoenformatik geliştiricilerinin sorularının dört ana kategoride 28 konuya odaklandığını ortaya koydu. En popüler konuların “Gen İfadesi ve İşlevi”, “Protein Etkileşim Tahmini”, “Gen ve Protein Yapısı Analizi”, “Ağ Sorunlarında Örnek Analizi” ve “Genomik Veri Yönetimi” olduğunu bulduk. Ayrıca, biyoenformatikteki konuların yedi topluluktan oluştuğunu ve bu toplulukların eğilimlerini ve en merkezi 100 kelime arasındaki ilişkiyi de sunduk. Bulgularımız ayrıca biyoenformatik alanında kod geliştiricilerinin en çok ilgi duyduğu konuların “yeni nesil dizileme”, “genom”, “gen”, “filogeni”, “proteinler” ve “sekans” olduğunu ortaya koydu. Bu çalışmadan elde ettiğimiz sonuçlara dayanarak, biyoenformatik kod geliştiricilerinin zaman içinde karşılaştığı sorunlar konu modelleme ve topluluk tespiti ile ortaya konmuştur.

Keywords

Biyoinformatik, Biyoinformatik Konuları, Konu Modelleme, Topluluk Tespiti, Stack Overflow

References

[1] Ramsden J., “Bioinformatics: An Introduction”, Springer Nature, fourth ed., Switzerland, (2023).
[2] Rastogi S.C., Rastogi P., Mendiratta N., “Bioinformatics: Methods and Applications-Genomics, Proteomics and Drug Discovery”, PHI Learning Pvt. Ltd., fifth ed., Delhi, (2022).
[3] Satam H., Joshi K., Mangrolia U., Waghoo S., Zaidi G., Rawool S., Thakare R.P., Banday S., Mishra A.K., Das G., Malonia S.K., “Next-generation sequencing technology: current trends and advancements”, Biology, 12(7);997, (2023).
[4] Kitsou K., Katzourakis A., Magiorkinis G., “Limitations of current high-throughput sequencing technologies lead to biased expression estimates of endogenous retroviral elements”, NAR Genomics and Bioinformatics, 6(3), (2024).
[5] Lesk A., “Introduction to Bioinformatics”, Oxford University Press, fifth ed., United Kingdom, (2019).
[6] Hie B., Peters J., Nyquist S.K., Shalek A.K., Berger B., Bryson B.D., “Computational methods for single-cell RNA sequencing”, Annual Review of Biomedical Data Science, 3(1);339-364, (2020).
[7] Topol E., “Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again”, Basic Books, United States, (2019).
[8] Tolani P., Gupta S., Yadav K., Aggarwal S., Yadav A.K., “Big data, integrative omics and network biology”, Advances in Protein Chemistry and Structural Biology, 127;127-160, (2021).
[9] Kashyap H., Ahmed H.A., Hoque N., Roy S., Bhattacharyya D.K., “Big data analytics in bioinformatics: architectures, techniques, tools and issues”, Network Modeling Analysis in Health Informatics and Bioinformatics, 5;1-28, (2016).
[10] Chen C., Wu Y., Li J., Wang X., Zeng Z., Xu J., Liu Y., Feng J., Chen H., He Y., Xia R., “TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining”, Molecular Plant, 16(11);1733-1742, (2023).
[11] Ahmed S.S., Wang S., Tian Y., Chen T.H.P., Zhang H., “Studying and recommending information highlighting in Stack Overflow answers”, Information and Software Technology, 172;107478, (2024).
[12] Gürcan F., Özyurt Ö., “Identification of trend topics discussed in Stackoverflow posts by word frequency analysis”, GUSTIJ, 11(2); 357-368, (2021).
[13] Blei D.M., Ng A.Y., Jordan M.I., “Latent dirichlet allocation”, Journal of Machine Learning Research, 3; 993-1022, (2003).
[14] Alghamdi R., Alfalqi K., “A survey of topic modeling in text mining”, Int. J. Adv. Comput. Sci. Appl., 6(1), (2015).
[15] Hahn A., Mohanty S.D., Manda P., “What’s hot and what’s not? Exploring trends in bioinformatics literature using topic modeling and keyword analysis”, in: Proceedings of ISBRA, pp. 279-290, (2017).
[16] Fortunato S., Hric D., “Community detection in networks: A user guide”, Physics Reports, 659; 1-44, (2016).
[17] Rossetti G., Pappalardo L., Rinzivillo S., “A novel approach to evaluate community detection algorithms on ground truth”, in: Proceedings of Complex Networks VII, pp. 133-144, (2016).
[18] Blondel V.D., Guillaume J.L., Lambiotte R., Lefebvre E., “Fast unfolding of communities in large networks”, J. Stat. Mech., P10008, (2008).
[19] Malliaros F.D., Vazirgiannis M., “Clustering and community detection in directed networks: A survey” Physics Reports, 533(4); 95-142, (2013)
[20] Su X., Xue S., Liu F., Wu J., Yang J., Zhou C., Hu W., Paris C., Nepal S., Jin D., Sheng Q.Z., Yu P.S., “A comprehensive survey on community detection with deep learning”, IEEE Transactions on Neural Networks and Learning Systems, 35(4); 4682-4702, (2024).
[21] Youssef A., Rich A., “Exploring trends and themes in bioinformatics literature using topic modeling and temporal analysis”, in: 2018 IEEE Long Island Systems, Applications and Technology Conference, pp. 1-6, (2018).
[22] Ebrahimi F., Dehghani M., Makkizadeh F., “Analysis of persian bioinformatics research with topic modeling”, BioMed Research International, 3728131, (2023).
[23] Papadimitriou C.H., Tamaki H., Raghavan P., Vempala S., “Latent semantic indexing: A probabilistic analysis”, in: Proceedings of ACM Sigact-Sigmod-Sigart, pp. 159-168, (1998).
[24] Blei D.M., “Probabilistic topic models”, Communications of the ACM, 55(4);77-84, (2012).
[25] Sayed A.H., “In Inference and Learning from Data: Inference”, Cambridge University Press, United Kingdom, (2023).
[26] Röder M., Both A., Hinneburg A., “Exploring the space of topic coherence measures”, in: Proceedings of WSDM, pp. 399-408, (2015).
[27] Ma T., Liu Q., Cao J., Tian Y., Al-Dhelaan A., Al-Rodhaan M., “LGIEM: Global and local node influence based community detection”, Future Generation Computer Systems, 105;533-546, (2020).
[28] Orman G., Labatut V., Cherifi H., “Qualitative comparison of community detection algorithms”, in: Proceedings of DICTAP, pp. 265-279, (2011).
[29] Salha-Galvan G., Lutzeyer J.F., Dasoulas G., Hennequin R., Vazirgiannis M., “Modularity-aware graph autoencoders for joint community detection and link prediction”, Neural Networks, 153;474-495, (2022).
[30] Langmead B., Salzberg S.L., “Fast gapped-read alignment with Bowtie 2”, Nature Methods, 9(4);357-359, (2012).
[31] Libbrecht M.W., Noble W.S., Machine learning applications in genetics and genomics”, Nature Reviews Genetics, 16(6);321-332, (2015).
[32] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Bruncher M., Perrot M., Duchesnay E., “Scikit-learn: Machine learning in Python”, Journal of Machine Learning Research, 12;2825-2830, (2011).
[33] Cibulskis K., Lawrence M.S., Carter S.L., Sivachenko A., Jaffe D., Sougnez C., Gabriel S., Meyerson M., Lander E.S., Getz G., “Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples”, Nature Biotechnology, 31(3); 213-219, (2013).
[34] Friedman A.A., Letai A., Fisher D.E., Flaherty K.T., “Precision medicine for cancer with next-generation functional diagnostics”, Nature Reviews Cancer, 15(12);747-756, (2015).
[35] Van Dijk D., Sharma R., Nainys J., Yim K., Kathail P., Carr A.J., Burdziak C., Moon K.R., Chaffer C.L., Pattabiraman D., Bierie B., Mazutis L., Wolf G., Krishnaswamy S., Pe'er D., “Recovering gene interactions from single-cell data using data diffusion”, Cell, 174(3);716-729, (2018).
[36] Gurcan F., Çağıltay N.E., “Exploratory analysis of topic interests and their evolution in bioinformatics research using semantic text mining and probabilistic topic modeling”, IEEE Access, 31480-31493, (2022).
[37] Qian X.B., Chen T., Xu Y.P., Chen L., Sun F.X., Lu M.P., Liu Y.X., “A guide to human microbiome research: study design, sample collection, and bioinformatics analysis”, Chinese Medical Journal, 133(15);1844-1855, (2020).
[38] Pereira R., Oliveira J., Sousa M., “Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics”, Journal of Clinical Medicine, 9(1);132, (2020).
[39] Auslander N., Gussow A.B., Koonin E.V., “Incorporating machine learning into established bioinformatics frameworks”, International Journal of Molecular Sciences, 22(6);2903, (2021).
[40] Shi J., “Machine learning and bioinformatics approaches for classification and clinical detection of bevacizumab responsive glioblastoma subtypes based on miRNA expression”, Scientific Reports, 12(1);8685, (2022).
[41] Wang L., Deng C., Wu Z., Zhu K., Yang Z., “Bioinformatics and machine learning were used to validate glutamine metabolism-related genes and immunotherapy in osteoporosis patients”, Journal of Orthopaedic Surgery and Research, 18(1); 685, (2023).
[42] Barun M.N., Önder E., “Unlocking the multidisciplinary potential of data science: Insights from apriori analysis”, Politeknik Dergisi, 1-1, (2024).
[43] Akalın F., Yumuşak N., “Classification of exon and ıntron regions on dna sequences with hybrid use of SBERT and ANFIS approaches”, Politeknik Dergisi, 27(3), 1043-1053, (2024).
[44] Tokdemir G., “Using text mining for research trends in empirical software engineering”, Politeknik Dergisi, 24(3), 1227-1235, (2021).

There are 44 citations in total.

Details

Primary Language	English
Subjects	Semi- and Unsupervised Learning, Machine Learning (Other), Natural Language Processing
Journal Section	Research Article
Authors	Gülbahar Merve Şilbir 0000-0003-0321-7259
Early Pub Date	May 3, 2025
Publication Date
Submission Date	November 19, 2024
Acceptance Date	April 17, 2025
Published in Issue	Year 2025 EARLY VIEW

Cite

APA	Şilbir, G. M. (2025). Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection. Politeknik Dergisi1-1. https://doi.org/10.2339/politeknik.1587995
AMA	Şilbir GM. Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection. Politeknik Dergisi. Published online May 1, 2025:1-1. doi:10.2339/politeknik.1587995
Chicago	Şilbir, Gülbahar Merve. “Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection”. Politeknik Dergisi, May (May 2025), 1-1. https://doi.org/10.2339/politeknik.1587995.
EndNote	Şilbir GM (May 1, 2025) Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection. Politeknik Dergisi 1–1.
IEEE	G. M. Şilbir, “Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection”, Politeknik Dergisi, pp. 1–1, May 2025, doi: 10.2339/politeknik.1587995.
ISNAD	Şilbir, Gülbahar Merve. “Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection”. Politeknik Dergisi. May 2025. 1-1. https://doi.org/10.2339/politeknik.1587995.
JAMA	Şilbir GM. Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection. Politeknik Dergisi. 2025;:1–1.
MLA	Şilbir, Gülbahar Merve. “Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection”. Politeknik Dergisi, 2025, pp. 1-1, doi:10.2339/politeknik.1587995.
Vancouver	Şilbir GM. Evaluation of Posts by Bioinformatics Code Developers on Stack Overflow Platform: Topic Modeling and Community Detection. Politeknik Dergisi. 2025:1-.

Article Files

Full Text

download This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International.