TY - JOUR T1 - Zaman Serisi Tabanlı Makine Öğrenmesi Modelleri ile GitHub Projelerindeki Programlama Dili Popülerliğinin Tahmini (2011–2021) TT - Time Series–Based Machine Learning Models for Forecasting Programming Language Popularity on the GitHub Projects (2011–2021) AU - Uğurlu, Bora AU - Karasulu, Bahadir PY - 2025 DA - December Y2 - 2025 DO - 10.34186/klujes.1790613 JF - Kirklareli University Journal of Engineering and Science JO - KLUJES PB - Kirklareli University WT - DergiPark SN - 2458-7494 SP - 351 EP - 365 VL - 11 IS - 2 LA - tr AB - Bu çalışmada GitHub platformunda 2011–2021 dönemine ait farklı programlama dillerinin depo (repository), çekme isteği (PR) ve sorun (issue) verileri kullanılarak, dillerin popülerliği zaman serisi tabanlı makine öğrenmesi yöntemleriyle tahmin edilmiştir. Üç farklı kaynaktan bütünleştirilen veri kümesi, dil–yıl–çeyrek düzeyinde PR, issue ve depo sayılarını içermekte; farklı kaynaklardan elde edilen metrikler tek bir zaman çizelgesinde birleştirilerek her dil için çeyreklik gözlemler üzerinden modelleme yapılmasına olanak vermektedir. Öznitelik mühendisliği sonrasında lojistik regresyon, karar ağaçları, rastgele orman, destek vektör makineleri ve gradyan artırma yöntemleri uygulanmıştır. Bulgular, Lojistik Regresyonun (AUC=0,996), Rastgele Ormanın (AUC=0,994) ve SVM’nin (AUC=0,988) güçlü ayırt edicilik sağladığını; Karar Ağaçları ve Gradyan Artırmanın ise yüksek doğruluk değerlerine rağmen ROC-AUC açısından daha zayıf kaldığını göstermektedir. Bu kapsamda, doğruluk ile ROC-AUC’nin birlikte raporlanması yöntemler arasındaki ayrım gücünü daha görünür kılmaktadır. Ayrıca analizler, Python ve JavaScript gibi dillerin uzun vadeli yükselişini doğrulamış, karar ağaçları ve gradyan artırma nadir dönemlerde öne çıkan dilleri yakalamada daha dengeli sonuçlar sunmuştur. KW - Programlama Dilleri KW - Makine Öğrenmesi KW - Zaman Serisi Analizi N2 - In this study, popularity trends of programming languages were predicted using time-series–based machine learning methods on GitHub data covering 2011–2021. The integrated dataset, compiled from three different sources, contains counts of repositories, pull requests (PRs), and issues at the language–year–quarter level; by consolidating metrics from multiple sources into a single timeline, it enables quarter-based modeling for each language. Following feature engineering, logistic regression, decision trees, random forests, support vector machines (SVM), and gradient boosting were applied. The findings indicate that Logistic Regression (AUC = 0.996), Random Forest (AUC = 0.994), and SVM (AUC = 0.988) provide strong discriminative performance, whereas Decision Trees and Gradient Boosting remain weaker in terms of ROC-AUC despite achieving high accuracy. In this context, reporting accuracy together with ROC-AUC makes differences in discriminative power across methods more apparent. Moreover, the analyses confirm the long-term rise of languages such as Python and JavaScript; decision trees and gradient boosting yield more balanced results in capturing languages that become prominent during rare periods. CR - Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197–227. https://doi.org/10.1007/s11749-016-0481-7 CR - Bissyandé, T. F., Lo, D., Jiang, L., Réveillère, L., Klein, J., & Le Traon, Y. (2013). Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), 188–197. IEEE. https://doi.org/10.1109/ISSRE.2013.6698917 CR - Borges, H., Hora, A., & Valente, M. T. (2016). Predicting the popularity of GitHub repositories. Proceedings of the 12th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2016), 1–10. ACM. https://doi.org/10.1145/2972958.2972966 CR - Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth International Group. CR - Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324 CR - Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 CR - Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018 CR - Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2), 215–242. CR - Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232 CR - GitHub Staff. (2024, October 29). Octoverse: AI leads Python to top language as the number of global developers surges. GitHub Blog. https://github.blog/news-insights/octoverse/octoverse-2024/ CR - Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Wiley. CR - Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems (NeurIPS), 30, 3146–3154. CR - Menard, S. (2002). Applied Logistic Regression Analysis (2nd ed.). Sage. CR - Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251 CR - Rahman, M. M., & Roy, C. K. (2014). An insight into the pull requests of GitHub. Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014), 364–367. ACM. https://doi.org/10.1145/2597073.2597076 CR - Ray, B., Posnett, D., Devanbu, P., & Filkov, V. (2017). A large-scale study of programming languages and code quality in GitHub. Communications of the ACM, 60(10), 91–100. https://doi.org/10.1145/3126905 CR - Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x CR - Wen, I. (2021). GitHub Programming Languages Data (2011–2021) [Dataset]. Kaggle. https://www.kaggle.com/datasets/isaacwen/github-programming-languages-data CR - Wessel, M., Vargovich, J., Gerosa, M. A., & Treude, C. (2023). GitHub Actions: The impact on the pull request process. Empirical Software Engineering, 28(131). https://doi.org/10.1007/s10664-023-10369-w UR - https://doi.org/10.34186/klujes.1790613 L1 - https://dergipark.org.tr/en/download/article-file/5271929 ER -