Research Article
BibTex RIS Cite

Telekomünikasyon Sektörü için Veri Madenciliği ve Makine Öğrenmesi Teknikleri ile Ayrılan Müşteri Analizi

Year 2021, , 172 - 191, 29.05.2021
https://doi.org/10.29130/dubited.807922

Abstract

Son yıllarda şirketler arası rekabetin artmasıyla beraber aboneliğinden ayrılacak müşterilerin tahmin edilmesi oldukça önemli hale gelmiştir. Müşteri karmaşası analizi, veri madenciliği, makine öğrenmesi ve derin öğrenme gibi alanlarda sıklıkla karşılaşılan analiz çeşitlerinden biridir. Özellikle telekomünikasyon, sigortacılık ve bankacılık gibi sektörlerde yaygın olarak kullanılmaktadır. Bu çalışma da veri madenciliği ve makine öğrenmesi teknikleri ile aboneliğini sonlandırma ihtimali olan müşterileri tahmin etmeyi amaçlamaktadır. Çalışma Lojistik Regresyon (Logistic Regression), Karar Ağacı (Decision Tree), Yapay Sinir Ağları (Artifical Neural Network), Torbalama (Bagging) ve Artırma (Boosting) sınıflandırma modelleri kullanılarak arasından en iyi sonucu bulmayı önermiştir. Veri seti dengesiz olduğu için SMOTE (Synthetic Minority Oversampling Technique) ve ADASYN (Adaptive Synthetic Sampling Method) tekniği ile örnekleme yapılmıştır. Çalışmada, 2 adet tahmin modeli önerilmiştir ve önerilen tahmin modelleri Veri Seti, Veri Ön İşleme, Veri Örnekleme, Değerlendirme olarak 4 farklı aşamadan oluşmaktadır. Veri Ön İşleme aşamasında, kullanılmayan ve önemsiz özniteliklerin veri setinden çıkartılması, normalizasyon, şifreleme (encoding) ve aşırı örnekleme gibi birçok yöntem kullanılmıştır. Performans ölçütü olarak Doğruluk Oranı (Accuracy Rate), Geri Çağırma (Recall), Hassasiyet (Precision) ve Özgünlük (Specificity), Dengelenmiş Doğruluk Oranı ve ROC Eğrisi Altındaki Alan (ROC-AUC) değeri kullanılmıştır. Performans ölçütlerine bakıldığında önerilen en iyi tahmin modeli ADASYN örnekleme yöntemi kullanılan model olmuştur. Sınıflandırma yöntemi olarak en iyi sonucu veren LightGBM (Light Gradient Boosting Machine) tekniği olmuştur. Önerilen modeller arasında Veri Ön İşleme ve Veri Örnekleme aşamalarında farklılıklar bulunmaktadır. Bu çalışmada önerilen tahmin modellerinin eğitim süresi, benzer çalışmalara göre daha iyi performans sağladığı tespit edilmiştir. Ayrıca bu çalışmada, sadece 58 öznitelik kullanarak 172 öznitelik kullanan benzer çalışmaların başardığına çok yakın sonuçlar elde edilmiştir.

Thanks

Bu çalışmayı desteklediği ve finanse ettiği için TTG International Ltd. 'e müteşekkirim ve veri akışı mimarisinde bize yardımcı olan uzmanlara minnettarım. TTG International Ltd., devlet kurumlarına ve mobil ağ operatörü şirketlerine OSS ürün tedarikçisidir. TTG International Ltd., araştırma çalışmalarını desteklemek ve aynı zamanda Ar-Ge çalışmalarına katılım yoluyla çalışanların yenilikçiliğini teşvik etmek için çeşitli ülkelerde etkin bir şekilde faaliyet göstermektedir.

References

  • [1] C. Gold, “What this book is about” in Fighting Churn With Data, 1. Baskı, O’reilly Media, 2020.
  • [2] Bilgi Teknolojileri ve İletişim Kurumu. “İletişim Hizmetleri İstatistikleri”. [Çevrimiçi]. Erişim Adresi: https://www.btk.gov.tr/uploads/pages/iletisim-hizmetleri-istatistikleri/istatistik-2019-4-5ec51cf389753.pdf. Erişim Tarihi: 01.09.2020.
  • [3] A. M. AL-Shatnwai, M. F. Altibbi, “Predicting Customer Retention using XGBoost and Balancing Methods,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 7, pp. 704- 712, 2020.
  • [4] A. R. Safitri, M. A. Muslim, “Improved Accuracy of Naive Bayes Classifier for Determination of Customer Churn Uses SMOTE and Genetic Algorithms,” JOSCEX Journal of Soft Computing Exploration, vol. 1, no. 1, pp. 70-75, 2020.
  • [5] D. Wadikar, “Customer Churn Prediction,” Yüksek Lisans Tezi, Technological University Dublin, 2020.
  • [6] H. Abbasimehr, M. Setak, M. J. Tarokh, “A Comparative Assessment of the Performance of Ensemble Learning in Customer Churn Prediction,” The International Arab Journal of Information Technology, vol. 11, no. 6, pp. 599-606, 2014.
  • [7] J. Vijaya ve E. Sivasankar, “Computing Efficient Features Using Rough Set Theory Combined with Ensemble Classification Techniques to Improve the Customer Churn Prediction in Telecommunication Sector,” Computing, vol. 100, no. 8, pp. 839–860, 2018.
  • [8] N.N.A. Sjarif, M.R.M. Yusof, D.H. Wong, S. Yaakob, R. Ibrahim ve M.Z. Osman, “A Customer Churn Prediction using Pearson Correlation Function and K Nearest Neighbor Algorithm for Telecommunication Industry,” International Journal of Advances in Soft Computing & Its Applications, c. 11, s. 2, ss. 46-59, 2019.
  • [9] Y. Tan, L.H. Shuan, L.J. Yan ve X. Guo, “Prediction on Customer Churn in the Telecommunications Sector Using Discretization and Naïve Bayes Classifier,” International Journal of Advances in Soft Computing and its Applications, c. 9, s. 3, ss. 23-35, 2017.
  • [10] K.G. Li, B.P. Marikannan, “Hyperparameters Tuning and Model Comparison for Telecommunication Customer Churn Predictive Models,” 3rd Global Conference on Computing & Media Technology, ss. 475-83, 2020.
  • [11] Cell2Cell Dataset: Teradata Center For Customer Relationship Management at Duke University, Dec. 2018. [Çevrimiçi]. Erişim Adresi: https://www.kaggle.com/Jpacse/Datasets-for-Churn-Telecom. Erişim Tarihi: 15.10.2020
  • [12] K. Potdar, T. Pardawala ve C. Pai “A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers,” International Journal of Computer Applications, c. 175, s. 4, ss. 7–9, 2017.
  • [13] Ş. Taşdemir, B. Yanıktepe ve A.B. Güher, “The Effect on the Wind Power Performance of Different Normalization Methods by Using Multilayer Feed-Forward Backpropagation Neural Network,” International Journal of Energy Applications and Technologies, c. 5, ss. 131–139, 2018.
  • [14] A.Y. Liu, “The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets,” Yüksek Lisans Tezi, University of Texas at Austin, USA, 2004.
  • [15] N.V. Chawla, K.W. Bowyer, L.O. Hall ve W.P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, c. 16, ss. 321–357, 2002.
  • [16] H. He, Y. Bai, E.A. Garcia ve S. Li, “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning,” 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), ss. 1322-1328, 2008.
  • [17] L. Breiman, “Bagging Predictors,” Department of Statistics, University of California Berkeley, Technical Report No. 421, 1994. Retrieved 2019-07-28.
  • [18] Y. Freund ve R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Journal of Computer and System Sciences, c. 55, s. 1, ss. 119-139, 1997.
  • [19] M. R. H. Subho, M. R. Chowdhury, D. Chaki, S. Islam and M. M. Rahman, “A Univariate Feature Selection Approach for Finding Key Factors of Restaurant Business,” 2019 IEEE Region 10 Symposium (TENSYMP), Kolkata, India, 2019, pp. 605-610.
  • [20] D. W. Hosmer, S. Lemeshow ve R. X. Sturdivant, “Introduction” in Applied Logistic Regression, 3. Baskı, WILEY, 2013.
  • [21] L. Breiman, “Random Forests,” Machine Learning, c. 45, s. 1, ss. 5-32, 2001.
  • [22] A. Idris ve A. Khan, “Customer Churn Prediction for Telecommunication: Employing Various Various Features Selection Techniques and Tree Based Ensemble Classifiers,” 2012 15th International Multitopic Conference (INMIC), ss. 23-27, 2012. doi:10.1109/inmic.2012.6511498.
  • [23] J. Ali, R. Khan, N. Ahmad ve I. Maqsood, “Random Forests and Decision Trees,” IJCSI International Journal of Computer Science Issues International Journal of Computer Science Issues, c. 9, s. 3, 2012.
  • [24] Y. Khan, S. Shafiq, A. Abid, S. Ahmed, N. Safwan, S. Hussain, “Customers Churn Prediction using Artificial Neural Networks (ANN) in Telecom Industry,” International Journal of Advanced Computer Science and Applications, c. 10, s. 9, ss. 132-142, 2019, doi: 10.14569/IJACSA.2019.0100918.
  • [25] P. Tan, M. Steinbach, V. Kumar, “Performance Measure” in Introduction to Data Mining, Pearson Education Limited (UK), 2014.
  • [26] M. Yıldız ve S. Albayrak, “Customer Churn Prediction in Telecommunication,” 2015 23nd Signal Processing and Communications Applications Conference (SIU), ss. 256-259, 2015.
  • [27] S. Jamil ve A. Khan. “Churn Comprehension Analysis for Telecommunication Industry Using ALBA,” 2016 International Conference on Emerging Technologies (ICET), ss. 1-5, 2016.
  • [28] A. Amin, F. Obeidat, B. Shah, A. Adnan, J. Loo ve S. Anwar, “Customer Churn Prediction in Telecommunication Industry Using Data Certainty,” Journal of Business Research, c. 94, ss. 290–301, 2019.

Churn Analysis for Telecommunication Sector with Data Mining and Machine Learning

Year 2021, , 172 - 191, 29.05.2021
https://doi.org/10.29130/dubited.807922

Abstract

With the increasing competition among companies in recent years, it has become very important to estimate the customers who are churned. Churn is one of the most common types of analysis, especially in areas such as data mining, machine learning and deep learning. It is widely used in sectors such as telecommunications, insurance and banking. In this study, it purpose to predict customers who may end their subscription with data mining and machine learning techniques. This study proposed to find the best result from using Logistic Regression, Decision Tree, Artificial Neural Network, Bagging and Boosting classification models. For the data set was unstable, sampling was performed using SMOTE (Synthetic Minority Oversampling Technique) and ADASYN (Adaptive Synthetic Sampling Method) technique. In the study, 2 prediction models are proposed and the proposed prediction models consist of 4 different phases as Data Set, Data Pre-Processing, Data Sampling and Evaluation. In the Data Pre-Processing phase, many methods were used, such as removing unused and unimportant features from the data set, normalization, encoding and oversampling. Accuracy Rate, Recall, Precision and Specificity, Balanced Accuracy Rate and Area Under the ROC Curve (ROC-AUC) value were used as performance measures. Considering the performance measures, the best prediction model suggested was the model using ADASYN sampling method. As the classification method, the best success was the LightGBM (Light Gradient Boosting Machine) technique. There are differences in the Data Pre-Processing and Data Sampling stages phases the proposed models. It was determined that the prediction models proposed in this study provide better performance than similar studies. Also, in this study, results very close to those achieved by similar studies using 172 features using only 58 features were obtained.

References

  • [1] C. Gold, “What this book is about” in Fighting Churn With Data, 1. Baskı, O’reilly Media, 2020.
  • [2] Bilgi Teknolojileri ve İletişim Kurumu. “İletişim Hizmetleri İstatistikleri”. [Çevrimiçi]. Erişim Adresi: https://www.btk.gov.tr/uploads/pages/iletisim-hizmetleri-istatistikleri/istatistik-2019-4-5ec51cf389753.pdf. Erişim Tarihi: 01.09.2020.
  • [3] A. M. AL-Shatnwai, M. F. Altibbi, “Predicting Customer Retention using XGBoost and Balancing Methods,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 7, pp. 704- 712, 2020.
  • [4] A. R. Safitri, M. A. Muslim, “Improved Accuracy of Naive Bayes Classifier for Determination of Customer Churn Uses SMOTE and Genetic Algorithms,” JOSCEX Journal of Soft Computing Exploration, vol. 1, no. 1, pp. 70-75, 2020.
  • [5] D. Wadikar, “Customer Churn Prediction,” Yüksek Lisans Tezi, Technological University Dublin, 2020.
  • [6] H. Abbasimehr, M. Setak, M. J. Tarokh, “A Comparative Assessment of the Performance of Ensemble Learning in Customer Churn Prediction,” The International Arab Journal of Information Technology, vol. 11, no. 6, pp. 599-606, 2014.
  • [7] J. Vijaya ve E. Sivasankar, “Computing Efficient Features Using Rough Set Theory Combined with Ensemble Classification Techniques to Improve the Customer Churn Prediction in Telecommunication Sector,” Computing, vol. 100, no. 8, pp. 839–860, 2018.
  • [8] N.N.A. Sjarif, M.R.M. Yusof, D.H. Wong, S. Yaakob, R. Ibrahim ve M.Z. Osman, “A Customer Churn Prediction using Pearson Correlation Function and K Nearest Neighbor Algorithm for Telecommunication Industry,” International Journal of Advances in Soft Computing & Its Applications, c. 11, s. 2, ss. 46-59, 2019.
  • [9] Y. Tan, L.H. Shuan, L.J. Yan ve X. Guo, “Prediction on Customer Churn in the Telecommunications Sector Using Discretization and Naïve Bayes Classifier,” International Journal of Advances in Soft Computing and its Applications, c. 9, s. 3, ss. 23-35, 2017.
  • [10] K.G. Li, B.P. Marikannan, “Hyperparameters Tuning and Model Comparison for Telecommunication Customer Churn Predictive Models,” 3rd Global Conference on Computing & Media Technology, ss. 475-83, 2020.
  • [11] Cell2Cell Dataset: Teradata Center For Customer Relationship Management at Duke University, Dec. 2018. [Çevrimiçi]. Erişim Adresi: https://www.kaggle.com/Jpacse/Datasets-for-Churn-Telecom. Erişim Tarihi: 15.10.2020
  • [12] K. Potdar, T. Pardawala ve C. Pai “A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers,” International Journal of Computer Applications, c. 175, s. 4, ss. 7–9, 2017.
  • [13] Ş. Taşdemir, B. Yanıktepe ve A.B. Güher, “The Effect on the Wind Power Performance of Different Normalization Methods by Using Multilayer Feed-Forward Backpropagation Neural Network,” International Journal of Energy Applications and Technologies, c. 5, ss. 131–139, 2018.
  • [14] A.Y. Liu, “The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets,” Yüksek Lisans Tezi, University of Texas at Austin, USA, 2004.
  • [15] N.V. Chawla, K.W. Bowyer, L.O. Hall ve W.P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, c. 16, ss. 321–357, 2002.
  • [16] H. He, Y. Bai, E.A. Garcia ve S. Li, “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning,” 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), ss. 1322-1328, 2008.
  • [17] L. Breiman, “Bagging Predictors,” Department of Statistics, University of California Berkeley, Technical Report No. 421, 1994. Retrieved 2019-07-28.
  • [18] Y. Freund ve R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Journal of Computer and System Sciences, c. 55, s. 1, ss. 119-139, 1997.
  • [19] M. R. H. Subho, M. R. Chowdhury, D. Chaki, S. Islam and M. M. Rahman, “A Univariate Feature Selection Approach for Finding Key Factors of Restaurant Business,” 2019 IEEE Region 10 Symposium (TENSYMP), Kolkata, India, 2019, pp. 605-610.
  • [20] D. W. Hosmer, S. Lemeshow ve R. X. Sturdivant, “Introduction” in Applied Logistic Regression, 3. Baskı, WILEY, 2013.
  • [21] L. Breiman, “Random Forests,” Machine Learning, c. 45, s. 1, ss. 5-32, 2001.
  • [22] A. Idris ve A. Khan, “Customer Churn Prediction for Telecommunication: Employing Various Various Features Selection Techniques and Tree Based Ensemble Classifiers,” 2012 15th International Multitopic Conference (INMIC), ss. 23-27, 2012. doi:10.1109/inmic.2012.6511498.
  • [23] J. Ali, R. Khan, N. Ahmad ve I. Maqsood, “Random Forests and Decision Trees,” IJCSI International Journal of Computer Science Issues International Journal of Computer Science Issues, c. 9, s. 3, 2012.
  • [24] Y. Khan, S. Shafiq, A. Abid, S. Ahmed, N. Safwan, S. Hussain, “Customers Churn Prediction using Artificial Neural Networks (ANN) in Telecom Industry,” International Journal of Advanced Computer Science and Applications, c. 10, s. 9, ss. 132-142, 2019, doi: 10.14569/IJACSA.2019.0100918.
  • [25] P. Tan, M. Steinbach, V. Kumar, “Performance Measure” in Introduction to Data Mining, Pearson Education Limited (UK), 2014.
  • [26] M. Yıldız ve S. Albayrak, “Customer Churn Prediction in Telecommunication,” 2015 23nd Signal Processing and Communications Applications Conference (SIU), ss. 256-259, 2015.
  • [27] S. Jamil ve A. Khan. “Churn Comprehension Analysis for Telecommunication Industry Using ALBA,” 2016 International Conference on Emerging Technologies (ICET), ss. 1-5, 2016.
  • [28] A. Amin, F. Obeidat, B. Shah, A. Adnan, J. Loo ve S. Anwar, “Customer Churn Prediction in Telecommunication Industry Using Data Certainty,” Journal of Business Research, c. 94, ss. 290–301, 2019.
There are 28 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Furkan Uyanık 0000-0003-2127-963X

Mustafa Cem Kasapbaşı 0000-0001-6444-6659

Publication Date May 29, 2021
Published in Issue Year 2021

Cite

APA Uyanık, F., & Kasapbaşı, M. C. (2021). Telekomünikasyon Sektörü için Veri Madenciliği ve Makine Öğrenmesi Teknikleri ile Ayrılan Müşteri Analizi. Duzce University Journal of Science and Technology, 9(3), 172-191. https://doi.org/10.29130/dubited.807922
AMA Uyanık F, Kasapbaşı MC. Telekomünikasyon Sektörü için Veri Madenciliği ve Makine Öğrenmesi Teknikleri ile Ayrılan Müşteri Analizi. DÜBİTED. May 2021;9(3):172-191. doi:10.29130/dubited.807922
Chicago Uyanık, Furkan, and Mustafa Cem Kasapbaşı. “Telekomünikasyon Sektörü için Veri Madenciliği Ve Makine Öğrenmesi Teknikleri Ile Ayrılan Müşteri Analizi”. Duzce University Journal of Science and Technology 9, no. 3 (May 2021): 172-91. https://doi.org/10.29130/dubited.807922.
EndNote Uyanık F, Kasapbaşı MC (May 1, 2021) Telekomünikasyon Sektörü için Veri Madenciliği ve Makine Öğrenmesi Teknikleri ile Ayrılan Müşteri Analizi. Duzce University Journal of Science and Technology 9 3 172–191.
IEEE F. Uyanık and M. C. Kasapbaşı, “Telekomünikasyon Sektörü için Veri Madenciliği ve Makine Öğrenmesi Teknikleri ile Ayrılan Müşteri Analizi”, DÜBİTED, vol. 9, no. 3, pp. 172–191, 2021, doi: 10.29130/dubited.807922.
ISNAD Uyanık, Furkan - Kasapbaşı, Mustafa Cem. “Telekomünikasyon Sektörü için Veri Madenciliği Ve Makine Öğrenmesi Teknikleri Ile Ayrılan Müşteri Analizi”. Duzce University Journal of Science and Technology 9/3 (May 2021), 172-191. https://doi.org/10.29130/dubited.807922.
JAMA Uyanık F, Kasapbaşı MC. Telekomünikasyon Sektörü için Veri Madenciliği ve Makine Öğrenmesi Teknikleri ile Ayrılan Müşteri Analizi. DÜBİTED. 2021;9:172–191.
MLA Uyanık, Furkan and Mustafa Cem Kasapbaşı. “Telekomünikasyon Sektörü için Veri Madenciliği Ve Makine Öğrenmesi Teknikleri Ile Ayrılan Müşteri Analizi”. Duzce University Journal of Science and Technology, vol. 9, no. 3, 2021, pp. 172-91, doi:10.29130/dubited.807922.
Vancouver Uyanık F, Kasapbaşı MC. Telekomünikasyon Sektörü için Veri Madenciliği ve Makine Öğrenmesi Teknikleri ile Ayrılan Müşteri Analizi. DÜBİTED. 2021;9(3):172-91.