Müşteri Terk Verisi Üzerinde Artırma Yöntemlerinin Performans Karşılaştırması

Başak Ceren Seçik Göçer; İbrahim Emiroğlu

Araştırma Makalesi

Performance Comparison of Boosting Methods on Customer Churn Data

Yıl 2024, Cilt: 2 Sayı: 1, 1 - 16, 30.04.2024

Başak Ceren Seçik Göçer İbrahim Emiroğlu

Öz

Data mining and machine learning models are frequently used today to generate insights in churn analysis. Through churn analysis, businesses can make inferences before their customers leave the company or stop using their products, and can increase both profits and customer satisfaction by reducing customer churn. There are many ways to perform these analyses. Rule-based models can be developed and predictions can be made with various machine learning models. In this article, machine learning models were built and analyzed on a data set consisting of 7043 observations and 57 variables, which is publicly available on kaggle.com. As a result of this analysis using customers' data, it was predicted which customers will continue to be customers of the telecom company and which customers will leave the company. It is discussed which features are important on the churn situation. Light GBM, XGBoost, CatBoost, and Gradient Boosting methods are used as machine learning models and the performances between these boosting methods are evaluated. The data set is balanced by applying resampling techniques. Model performance was assessed based on accuracy, F1-Score, and sensitivity metrics. When metrics accuracy and F1-Score were evaluated, no significant difference was found in the model performances. However, when metric sensitivity was assessed, the best performance was achieved by the Extreme Gradient Boost (XGBoost) model with a ratio of 0.949.

Anahtar Kelimeler

Customer churn analysis, telecommunication, classification, machine learning

Kaynakça

[1] Çiçek, A., & Arslan, Y., “Müşteri Kayıp Analizi İçin Sınıflandırma Algoritmalarının Karşılaştırılması,” İleri Mühendislik Çalışmaları ve Teknolojileri Dergisi, 1(1), 13-19, (2020).
[2] Gold, C, “Fighting churn with data. the science and strategy of customer retention,” Manning Pubn, (2020).
[3] Almana, A. M., Aksoy, M. S., & Alzahrani, R., “A survey on data mining techniques in customer churn analysis for telecom industry,” International Journal of Engineering Research and Applications, 4(5), 165-171, (2014).
[4] Arnold, T. J., (Er) Fang, E., & Palmatier, R. W., “The effects of customer acquisition and retention orientations on a firm’s radical and incremental innovation performance,” Journal of the Academy of Marketing Science, 39, 234-251, (2011).
[5] Gallo, A., “The value of keeping the right customers.Harvard Business Review,” https://hbr.org/2014/10/the-value-of-keeping-the-right-customers, (2014, Ekim 29)
[6] Huang, B. Q., Kechadi, T. M., Buckley, B., Kiernan, G., Keogh, E., & Rashid, T., “A new feature set with new window techniques for customer churn prediction in land-line telecommunications,” Expert Systems with Applications, 37(5), 3657-3665, (2010).
[7] Ahmad, A. K., Jafar, A., & Aljoumaa, K., “Customer churn prediction in telecom using machine learning in big data platform,” Journal of Big Data, 6(1), 1-24, (2019). https://doi.org/10.1186/s40537-019-0191-6
[8] Brandusoiu, I., Toderean G., & Beleiu, H., “Methods for churn prediction in the prepaid mobile telecommunications industry,” Konferans Sunumu, 2016 International Conference on Communications (COMM), Bükreş, Romanya, (2016, Haziran 09-10) https://doi.org/10.1109/ICComm.2016.7528311
[9] Öztürk, M.E., Tunç, A.A., & Akay, M.F., “Machine learning based churn analysis for sellers on the e-commerce Marketplace,” International Journal of Mathematics and Computer in Engineering 1(2), 171–176, (2023). https://doi.org/10.2478/ijmce-2023-0013
[10] An, Z., Song, Z., & Wang, X., “Bank Customer Churn Based on Different Models, Oversampling, and Encoding Methods,” BCP Business & Management 26, 703-713, (2022). https://doi.org/10.54691/bcpbm.v26i.2030
[11] Sayed, H., Abdel-Fattah, M. A., & Kholief, S., “Predicting potential banking customer churn using apache spark ML and MLlib packages: a comparative study,” International Journal of Advanced Computer Science and Applications, 9(11), (2018).
[12] Saleh, S., Saha, S., “Customer retention and churn prediction in the telecommunication industry: a case study on a Danish university,” SN Applied Sciences, 5(7), 173, (2023).
[13] Çelik, S., Tayalı, S.T., “Resampling and Ensemble Strategies for Churn Prediction,” Bilişim Teknolojileri Dergisi, 16(4), 263-273, (2023).
[14] Verma, P., “Churn prediction for savings bank customers: A machine learning approach,” Journal of Statistics Applications & Probability, 9(3), 535-547, (2020).
[15] Friedman, J. H., “Greedy function approximation: a gradient boosting machine,” Annals of statistics, 1189-1232, (2001).
[16] Chen, T., & Guestrin, C., “Xgboost: A scalable tree boosting system,” Konferans Sunumu, 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco, CA, USA, (2016, Ağustos 13-17) https://dl.acm.org/doi/pdf/10.1145/2939672.2939785
[17] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. Y., “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, 30, (2017).
[18] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A., “CatBoost: unbiased boosting with categorical features,” Advances in neural information processing systems, 31, (2018).
[19] Dorogush, A. V., Ershov, V., & Gulin, A., “CatBoost: gradient boosting with categorical features support,” arXiv preprint arXiv:1810.11363, (2018).
[20] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P., “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, 16, 321-357, (2002).
[21] Microsoft., “Makine öğrenimi modellerinin sonuçları,” https://learn.microsoft.com/tr-tr/dynamics365/finance/finance-insights/confusion-matrix, (2023, Mart 08)
[22] Udila, A., “Encoding methods for categorical data: A comparative analysis for linear models, decision trees, and support vector machines,” Lisans Tezi, TU Delft Electrical Engineering, Mathematics and Computer Science, (2023).
[23] Berrar, D., “Cross-Validation,” In Encyclopedia of Bioinformatics and Computational Biology; Elsevier Science: Amsterdam, The Netherlands , 542-545, (2019).

Müşteri Terk Verisi Üzerinde Artırma Yöntemlerinin Performans Karşılaştırması

Yıl 2024, Cilt: 2 Sayı: 1, 1 - 16, 30.04.2024

Başak Ceren Seçik Göçer İbrahim Emiroğlu

Öz

Müşteri kaybı analizinde öngörü oluşturmak için günümüzde veri madenciliği ve makine öğrenmesi modelleri sıklıkla kullanılmaktadır. Müşteri kaybı analizi sayesinde, işletmeler müşterileri şirketi terk etmeden veya ürünlerini kullanmayı bırakmadan önce bazı çıkarımlarda bulunabilir ve müşteri terk oranını düşürerek hem kârlarını hem de müşteri memnuniyetini artırabilir. Bu analizleri yapmanın birçok yolu bulunmaktadır. Kural tabanlı modeller geliştirilebilir, çeşitli makine öğrenmesi modelleriyle tahminler yapılabilir. Bu makale çalışmasında kaggle.com sitesinde açık erişime sunulan 7043 gözlem 57 değişkenden oluşan veri seti üzerinde makine öğrenmesi modelleri kurularak analizler gerçekleştirilmiştir. Müşterilerin verileri kullanılarak yapılan bu analiz sonucunda hangi müşterinin Telekom şirketinin müşterisi olarak kalmaya devam edeceği, hangi müşterinin şirketi terk edeceği tahmin edilmiştir. Terk etme durumu üzerinde hangi özelliklerin önemli olduğu tartışılmıştır. Makine öğrenmesi modelleri olarak Hafif Gradyan Artırma (Light GBM), Aşırı Gradyan Artırma (XGBoost), CatBoost ve Gradyan Artırma (Gradient Boosting) metotları kullanılmış ve bu artırma metotları arasındaki performanslar değerlendirilmiştir. Veri seti yeniden örnekleme teknikleri uygulanarak veri dengeli hale getirilmiştir. Doğruluk, F1 - Skoru ve duyarlılık metrikleri baz alınarak model başarıları ölçülmüştür. Doğruluk ve F1- Skoru metrikleri değerlendirildiğinde model performansları arasında anlamlı fark bulunmazken, duyarlılık metriği değerlendirildiğinde en iyi performans Aşırı Gradyan Artırma (XGBoost) modeli tarafından 0,949 oranıyla sağlanmıştır.

Anahtar Kelimeler

Müşteri kayıp analizi, telekomünikasyon, sınıflandırma, makine öğrenmesi

Kaynakça

[1] Çiçek, A., & Arslan, Y., “Müşteri Kayıp Analizi İçin Sınıflandırma Algoritmalarının Karşılaştırılması,” İleri Mühendislik Çalışmaları ve Teknolojileri Dergisi, 1(1), 13-19, (2020).
[2] Gold, C, “Fighting churn with data. the science and strategy of customer retention,” Manning Pubn, (2020).
[3] Almana, A. M., Aksoy, M. S., & Alzahrani, R., “A survey on data mining techniques in customer churn analysis for telecom industry,” International Journal of Engineering Research and Applications, 4(5), 165-171, (2014).
[4] Arnold, T. J., (Er) Fang, E., & Palmatier, R. W., “The effects of customer acquisition and retention orientations on a firm’s radical and incremental innovation performance,” Journal of the Academy of Marketing Science, 39, 234-251, (2011).
[5] Gallo, A., “The value of keeping the right customers.Harvard Business Review,” https://hbr.org/2014/10/the-value-of-keeping-the-right-customers, (2014, Ekim 29)
[6] Huang, B. Q., Kechadi, T. M., Buckley, B., Kiernan, G., Keogh, E., & Rashid, T., “A new feature set with new window techniques for customer churn prediction in land-line telecommunications,” Expert Systems with Applications, 37(5), 3657-3665, (2010).
[7] Ahmad, A. K., Jafar, A., & Aljoumaa, K., “Customer churn prediction in telecom using machine learning in big data platform,” Journal of Big Data, 6(1), 1-24, (2019). https://doi.org/10.1186/s40537-019-0191-6
[8] Brandusoiu, I., Toderean G., & Beleiu, H., “Methods for churn prediction in the prepaid mobile telecommunications industry,” Konferans Sunumu, 2016 International Conference on Communications (COMM), Bükreş, Romanya, (2016, Haziran 09-10) https://doi.org/10.1109/ICComm.2016.7528311
[9] Öztürk, M.E., Tunç, A.A., & Akay, M.F., “Machine learning based churn analysis for sellers on the e-commerce Marketplace,” International Journal of Mathematics and Computer in Engineering 1(2), 171–176, (2023). https://doi.org/10.2478/ijmce-2023-0013
[10] An, Z., Song, Z., & Wang, X., “Bank Customer Churn Based on Different Models, Oversampling, and Encoding Methods,” BCP Business & Management 26, 703-713, (2022). https://doi.org/10.54691/bcpbm.v26i.2030
[11] Sayed, H., Abdel-Fattah, M. A., & Kholief, S., “Predicting potential banking customer churn using apache spark ML and MLlib packages: a comparative study,” International Journal of Advanced Computer Science and Applications, 9(11), (2018).
[12] Saleh, S., Saha, S., “Customer retention and churn prediction in the telecommunication industry: a case study on a Danish university,” SN Applied Sciences, 5(7), 173, (2023).
[13] Çelik, S., Tayalı, S.T., “Resampling and Ensemble Strategies for Churn Prediction,” Bilişim Teknolojileri Dergisi, 16(4), 263-273, (2023).
[14] Verma, P., “Churn prediction for savings bank customers: A machine learning approach,” Journal of Statistics Applications & Probability, 9(3), 535-547, (2020).
[15] Friedman, J. H., “Greedy function approximation: a gradient boosting machine,” Annals of statistics, 1189-1232, (2001).
[16] Chen, T., & Guestrin, C., “Xgboost: A scalable tree boosting system,” Konferans Sunumu, 22nd acm sigkdd international conference on knowledge discovery and data mining, San Francisco, CA, USA, (2016, Ağustos 13-17) https://dl.acm.org/doi/pdf/10.1145/2939672.2939785
[17] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. Y., “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, 30, (2017).
[18] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A., “CatBoost: unbiased boosting with categorical features,” Advances in neural information processing systems, 31, (2018).
[19] Dorogush, A. V., Ershov, V., & Gulin, A., “CatBoost: gradient boosting with categorical features support,” arXiv preprint arXiv:1810.11363, (2018).
[20] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P., “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, 16, 321-357, (2002).
[21] Microsoft., “Makine öğrenimi modellerinin sonuçları,” https://learn.microsoft.com/tr-tr/dynamics365/finance/finance-insights/confusion-matrix, (2023, Mart 08)
[22] Udila, A., “Encoding methods for categorical data: A comparative analysis for linear models, decision trees, and support vector machines,” Lisans Tezi, TU Delft Electrical Engineering, Mathematics and Computer Science, (2023).
[23] Berrar, D., “Cross-Validation,” In Encyclopedia of Bioinformatics and Computational Biology; Elsevier Science: Amsterdam, The Netherlands , 542-545, (2019).

Toplam 23 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Yazılım Mühendisliği (Diğer)
Bölüm	Araştırma Makalesi
Yazarlar	Başak Ceren Seçik Göçer 0009-0000-5336-1123 İbrahim Emiroğlu 0000-0001-7772-8217
Erken Görünüm Tarihi	30 Nisan 2024
Yayımlanma Tarihi	30 Nisan 2024
Gönderilme Tarihi	30 Ocak 2024
Kabul Tarihi	17 Nisan 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 2 Sayı: 1

Kaynak Göster

IEEE	B. C. Seçik Göçer ve İ. Emiroğlu, “Müşteri Terk Verisi Üzerinde Artırma Yöntemlerinin Performans Karşılaştırması”, AMUBD, c. 2, sy. 1, ss. 1–16, 2024.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin