Customer Behavior Segmentation Using Fuzzy Clustering and Classification Techniques
Yıl 2025,
Cilt: 30 Sayı: 3, 990 - 1008, 24.12.2025
Fatih Kutlu
,
Kübra Göleli
,
Hanım Demir
,
Gülcan Erdiz
Öz
This study focuses on the classification of customer response behavior toward marketing campaign offers. The analysis utilizes the Customer Personality Analysis (CPA) dataset, which is publicly available on the Kaggle online data-sharing platform. The target variable, Response, indicates whether the customer accepted the most recent campaign offer (1 = accepted, 0 = not accepted). Since the proportion of positive responses is approximately 15% of all observations, the dataset exhibits a pronounced class imbalance. For this reason, performance evaluation prioritizes the macro-F1 metric rather than overall accuracy, as macro-F1 provides a more balanced representation of the minority class. The methodological framework involves the application of the Fuzzy C-Means (FCM) clustering algorithm to obtain membership degrees for each instance. These membership values are subsequently integrated into two classification models. In the FCM+FSVM model, the membership degrees are utilized as instance weights influencing the decision boundary. In the FCM+FKNN model, the same membership degrees are incorporated as adaptive weighting factors in the neighborhood-based voting mechanism. FCM hyperparameters are optimized using a genetic algorithm, while classifier hyperparameters are determined through random search. Comparative experiments including logistic regression, KNN, RBF-SVM, random forest, and gradient boosting demonstrate that the FCM+FSVM model achieves the highest performance in both overall classification accuracy and minority class recognition.
Proje Numarası
1919B012424159
Kaynakça
-
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.
-
Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2–3), 191–203. https://doi.org/10.1016/0098-3004(84)90020-7
-
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). Training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 144–152). https://doi.org/10.1145/130385.130401
-
Chen, W., Yang, K., Yu, Z., et al. (2024). A survey on imbalanced learning: Latest research, applications and future directions. Artificial Intelligence Review, 57, 137. https://doi.org/10.1007/s10462-024-10759-6
-
Chowdhary, C. L., Mittal, M., Kumaresan, P., Pattanaik, P. A., & Marszalek, Z. (2020). An efficient segmentation and classification system in medical images using intuitionist possibilistic fuzzy C-mean clustering and fuzzy SVM algorithm. Sensors, 20(14), 3903. https://doi.org/10.3390/s20143903
-
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964
-
Dolatabadian, A., Neik, T. X., Danilevicz, M. F., et al. (2024). Image-based crop disease detection using machine learning. Plant Pathology. https://doi.org/10.1111/ppa.14006
-
Fathi, M., Kianfar, K., Hasanzadeh, A., & Sadeghi, A. (2009). Customers fuzzy clustering and catalog segmentation in customer relationship management. In Proceedings of the 2009 IEEE International Conference on Industrial Engineering and Engineering Management (pp. 1234–1238). https://doi.org/10.1109/IEEM.2009.5372997
-
Gu, X., Ni, T., & Wang, H. (2014). New fuzzy support vector machine for the class imbalance problem in medical datasets classification. The Scientific World Journal, 2014, 536434. https://doi.org/10.1155/2014/536434
-
Gupta, S., Nagar, N., Nasirul, M., & Shabani, M. (2024). Improving market segmentation via customer personality prediction using deep AI analysis. Journal of Informatics Education and Research, 4(2), 2299. https://doi.org/10.52783/jier.v4i2.1062
-
Hassanat, A., Almohammadi, K., Alkafaween, E., Abunawas, E., Hammouri, A., & Prasath, V. B. S. (2019). Choosing mutation and crossover ratios for genetic algorithms—A review with a new dynamic approach. Information, 10(12), 390. https://doi.org/10.3390/info10120390
-
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
-
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression. Wiley. https://doi.org/10.1002/0471722146
-
Izakian, H., & Abraham, A. (2011). Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Systems with Applications, 38(3), 1835–1838. https://doi.org/10.1016/j.eswa.2010.07.112
-
Jena, P. K., & Chattopadhyay, S. (2012). Comparative study of fuzzy k-nearest neighbor and fuzzy c-means algorithms. International Journal of Computer Applications, 57(7), 22–29.
-
Keller, J. M., & Gray, M. R. (1985). A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics, SMC-15(4), 580–585. https://doi.org/10.1109/TSMC.1985.6313426
-
King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137–163. https://doi.org/10.1093/oxfordjournals.pan.a004868
-
Li, Y., Wang, H., Dang, L. M., Nguyen, T. N., Han, D., Lee, A., & Moon, H. (2020). A deep learning-based hybrid framework for object detection and recognition in autonomous driving. IEEE Access, 8, 194228–194239. https://doi.org/10.1109/ACCESS.2020.3033289
-
Ling, C. X., & Li, C. (1998). Data mining for direct marketing: Problems and solutions. KDD Conference.
-
Lin, C.-F., & Wang, S.-D. (2002). Fuzzy support vector machine. IEEE Transactions on Neural Networks, 13(2), 464–471. https://doi.org/10.1109/72.991432
-
Liu, Y., He, M., & Hui, B. (2025). ESO-DETR: An improved real-time detection transformer model for enhanced small object detection in UAV imagery. Drones, 9(2), 143. https://doi.org/10.3390/drones9020143
-
Ma, Q., Zhu, X., Zhao, X., Zhao, B., Fu, G., & Zhang, R. (2024). An equidistance index intuitionistic fuzzy c-means clustering algorithm based on local density and membership degree boundary. Applied Intelligence, 54(4), 3205-3221. https://doi.org/10.1007/s10489-024-05297-1
-
Maraş, A., & Erol, Ç. (2023). FuzzyCSampling: A hybrid fuzzy C-means clustering sampling strategy for imbalanced datasets. Turkish Journal of Electrical Engineering and Computer Sciences, 31, 1223–1236. https://doi.org/10.55730/1300-0632.4044
-
Marín Díaz, G. (2025). A fuzzy-XAI framework for customer segmentation and risk detection: Integrating RFM, 2-tuple modeling, and strategic scoring. Mathematics, 13(2141). https://doi.org/10.3390/math13132141
-
Núñez, H., Gonzalez-Abril, L., & Angulo, C. (2017). Improving SVM classification on imbalanced datasets by introducing a new bias. Journal of Classification, 34, 427–443. https://doi.org/10.1007/s00357-017-9242-x
-
Rezvani, S., Pourpanah, F., Lim, C. P., & Wu, Q. M. J. (2024). Methods for class-imbalanced learning with support vector machines: A review and an empirical evaluation. arXiv preprint arXiv:2406.03398. https://arxiv.org/abs/2406.03398
-
Selvalakshmi, V., Sree, T. M. U., Saranya, S., Devi, A. U., & Basha, M. S. A. (2025). Enhancing customer personality prediction using advanced machine learning techniques and data balancing strategies. In 2025 International Conference on Intelligent Systems and Computational Networks (ICISCN) (pp. 1–7). IEEE. https://doi.org/10.1109/ICISCN64258.2025.10934379
-
Sharma, R., Goel, T., Tanveer, M., & Al-Dhaifallah, M. (2025). Alzheimer’s disease diagnosis using ensemble of random weighted features and fuzzy least square twin support vector machine. IEEE Transactions on Emerging Topics in Computational Intelligence. https://doi.org/10.1109/TETCI.2024.3523714
-
Tekerek, A. (2019). Support vector machine based spam SMS detection. Politeknik Dergisi, 22(3), 779–784. https://doi.org/10.2339/politeknik.429707
-
Vani, H. Y., Anusuya, M. A., & Chayadevi, M. L. (2019). Fuzzy clustering algorithms: Comparative studies for noisy speech signals. ICTACT Journal on Soft Computing, 9(2), 1920–1926. https://doi.org/10.21917/ijsc.2019.0267
-
Wedel, M., & Kannan, P. K. (2016). Marketing analytics for data-rich environments. Journal of Marketing, 80(6), 97–121. https://doi.org/10.1509/jm.15.0413
-
Yang, X., Zhang, G., Lu, J., & Ma, J. (2011). A kernel fuzzy c-means clustering-based fuzzy support vector machine algorithm. IEEE Transactions on Fuzzy Systems, 19(1), 105–119. https://doi.org/10.1109/TFUZZ.2010.2087382
-
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353. https://doi.org/10.1016/S0019-9958(65)90241-X
-
Zhang, S. (2022). Challenges in KNN classification. IEEE Transactions on Knowledge and Data Engineering, 34(9), 4663–4675. https://doi.org/10.1109/TKDE.2021.3049250
Bulanık Kümeleme ve Sınıflandırma Teknikleri ile Müşteri Davranış Segmentasyonu
Yıl 2025,
Cilt: 30 Sayı: 3, 990 - 1008, 24.12.2025
Fatih Kutlu
,
Kübra Göleli
,
Hanım Demir
,
Gülcan Erdiz
Öz
Bu çalışmada, pazarlama kampanyalarına yönelik müşteri yanıt davranışlarının sınıflandırılması ele alınmıştır. Analizlerde, Kaggle çevrimiçi veri paylaşım platformunda açık erişimli olarak sunulan Customer Personality Analysis (CPA) veri seti kullanılmıştır. Veri setindeki hedef değişken Response, müşterinin son kampanya teklifini kabul etme durumunu ifade etmektedir (1 = kabul etti, 0 = kabul etmedi). Pozitif sınıfın toplam gözlem sayısına oranının yaklaşık %15 düzeyinde olması, belirgin bir sınıf dengesizliği problemine işaret etmektedir. Bu nedenle, model başarımının değerlendirilmesinde yalnızca doğruluk ölçütü yerine, azınlık sınıfını daha dengeli biçimde temsil eden makro-F1 metriği dikkate alınmıştır. Yöntemsel çerçevede ilk olarak bulanık c-Ortalamalar (FCM) algoritması uygulanarak her örnek için kümelere ilişkin üyelik dereceleri elde edilmiştir. Daha sonra bu üyelik dereceleri, FCM+FSVM yapısında örnek ağırlığı olarak sınıflandırma sürecine dahil edilmiş; FCM+FKNN yapısında ise komşuluk katkı katsayısı olarak kullanılmıştır. FCM hiperparametreleri genetik algoritma ile optimize edilirken, sınıflandırıcılara ilişkin hiperparametreler rastgele arama yöntemiyle belirlenmiştir. Deneysel çalışmalarda lojistik regresyon, KNN, RBF-SVM, rastgele orman ve gradyan artırma gibi yöntemlerle karşılaştırma yapılmış ve FCM+FSVM modelinin hem genel sınıflandırma başarımı hem de azınlık sınıfını tanıma yeteneği açısından en yüksek performansı sergilediği görülmüştür.
Etik Beyan
Bu makalenin yazarları çalışmalarında araştırma ve yayın etiğine uyduklarını beyan ederler.
Destekleyen Kurum
Bu çalışma, Türkiye Bilimsel ve Teknolojik Araştırma Kurumu (TÜBİTAK) tarafından 2209-A Üniversite Öğrencileri Araştırma Projeleri Destekleme Programı kapsamında, 1919B012424159 başvuru numarası ile desteklenmiştir.
Proje Numarası
1919B012424159
Teşekkür
Bu çalışma, Türkiye Bilimsel ve Teknolojik Araştırma Kurumu (TÜBİTAK) tarafından 2209-A Üniversite Öğrencileri Araştırma Projeleri Destekleme Programı kapsamında, 1919B012424159 başvuru numarası ile desteklenmiştir. Sağladığı değerli katkılardan ötürü TÜBİTAK’a teşekkür ederiz.
Kaynakça
-
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.
-
Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2–3), 191–203. https://doi.org/10.1016/0098-3004(84)90020-7
-
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). Training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 144–152). https://doi.org/10.1145/130385.130401
-
Chen, W., Yang, K., Yu, Z., et al. (2024). A survey on imbalanced learning: Latest research, applications and future directions. Artificial Intelligence Review, 57, 137. https://doi.org/10.1007/s10462-024-10759-6
-
Chowdhary, C. L., Mittal, M., Kumaresan, P., Pattanaik, P. A., & Marszalek, Z. (2020). An efficient segmentation and classification system in medical images using intuitionist possibilistic fuzzy C-mean clustering and fuzzy SVM algorithm. Sensors, 20(14), 3903. https://doi.org/10.3390/s20143903
-
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964
-
Dolatabadian, A., Neik, T. X., Danilevicz, M. F., et al. (2024). Image-based crop disease detection using machine learning. Plant Pathology. https://doi.org/10.1111/ppa.14006
-
Fathi, M., Kianfar, K., Hasanzadeh, A., & Sadeghi, A. (2009). Customers fuzzy clustering and catalog segmentation in customer relationship management. In Proceedings of the 2009 IEEE International Conference on Industrial Engineering and Engineering Management (pp. 1234–1238). https://doi.org/10.1109/IEEM.2009.5372997
-
Gu, X., Ni, T., & Wang, H. (2014). New fuzzy support vector machine for the class imbalance problem in medical datasets classification. The Scientific World Journal, 2014, 536434. https://doi.org/10.1155/2014/536434
-
Gupta, S., Nagar, N., Nasirul, M., & Shabani, M. (2024). Improving market segmentation via customer personality prediction using deep AI analysis. Journal of Informatics Education and Research, 4(2), 2299. https://doi.org/10.52783/jier.v4i2.1062
-
Hassanat, A., Almohammadi, K., Alkafaween, E., Abunawas, E., Hammouri, A., & Prasath, V. B. S. (2019). Choosing mutation and crossover ratios for genetic algorithms—A review with a new dynamic approach. Information, 10(12), 390. https://doi.org/10.3390/info10120390
-
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
-
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression. Wiley. https://doi.org/10.1002/0471722146
-
Izakian, H., & Abraham, A. (2011). Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Systems with Applications, 38(3), 1835–1838. https://doi.org/10.1016/j.eswa.2010.07.112
-
Jena, P. K., & Chattopadhyay, S. (2012). Comparative study of fuzzy k-nearest neighbor and fuzzy c-means algorithms. International Journal of Computer Applications, 57(7), 22–29.
-
Keller, J. M., & Gray, M. R. (1985). A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics, SMC-15(4), 580–585. https://doi.org/10.1109/TSMC.1985.6313426
-
King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137–163. https://doi.org/10.1093/oxfordjournals.pan.a004868
-
Li, Y., Wang, H., Dang, L. M., Nguyen, T. N., Han, D., Lee, A., & Moon, H. (2020). A deep learning-based hybrid framework for object detection and recognition in autonomous driving. IEEE Access, 8, 194228–194239. https://doi.org/10.1109/ACCESS.2020.3033289
-
Ling, C. X., & Li, C. (1998). Data mining for direct marketing: Problems and solutions. KDD Conference.
-
Lin, C.-F., & Wang, S.-D. (2002). Fuzzy support vector machine. IEEE Transactions on Neural Networks, 13(2), 464–471. https://doi.org/10.1109/72.991432
-
Liu, Y., He, M., & Hui, B. (2025). ESO-DETR: An improved real-time detection transformer model for enhanced small object detection in UAV imagery. Drones, 9(2), 143. https://doi.org/10.3390/drones9020143
-
Ma, Q., Zhu, X., Zhao, X., Zhao, B., Fu, G., & Zhang, R. (2024). An equidistance index intuitionistic fuzzy c-means clustering algorithm based on local density and membership degree boundary. Applied Intelligence, 54(4), 3205-3221. https://doi.org/10.1007/s10489-024-05297-1
-
Maraş, A., & Erol, Ç. (2023). FuzzyCSampling: A hybrid fuzzy C-means clustering sampling strategy for imbalanced datasets. Turkish Journal of Electrical Engineering and Computer Sciences, 31, 1223–1236. https://doi.org/10.55730/1300-0632.4044
-
Marín Díaz, G. (2025). A fuzzy-XAI framework for customer segmentation and risk detection: Integrating RFM, 2-tuple modeling, and strategic scoring. Mathematics, 13(2141). https://doi.org/10.3390/math13132141
-
Núñez, H., Gonzalez-Abril, L., & Angulo, C. (2017). Improving SVM classification on imbalanced datasets by introducing a new bias. Journal of Classification, 34, 427–443. https://doi.org/10.1007/s00357-017-9242-x
-
Rezvani, S., Pourpanah, F., Lim, C. P., & Wu, Q. M. J. (2024). Methods for class-imbalanced learning with support vector machines: A review and an empirical evaluation. arXiv preprint arXiv:2406.03398. https://arxiv.org/abs/2406.03398
-
Selvalakshmi, V., Sree, T. M. U., Saranya, S., Devi, A. U., & Basha, M. S. A. (2025). Enhancing customer personality prediction using advanced machine learning techniques and data balancing strategies. In 2025 International Conference on Intelligent Systems and Computational Networks (ICISCN) (pp. 1–7). IEEE. https://doi.org/10.1109/ICISCN64258.2025.10934379
-
Sharma, R., Goel, T., Tanveer, M., & Al-Dhaifallah, M. (2025). Alzheimer’s disease diagnosis using ensemble of random weighted features and fuzzy least square twin support vector machine. IEEE Transactions on Emerging Topics in Computational Intelligence. https://doi.org/10.1109/TETCI.2024.3523714
-
Tekerek, A. (2019). Support vector machine based spam SMS detection. Politeknik Dergisi, 22(3), 779–784. https://doi.org/10.2339/politeknik.429707
-
Vani, H. Y., Anusuya, M. A., & Chayadevi, M. L. (2019). Fuzzy clustering algorithms: Comparative studies for noisy speech signals. ICTACT Journal on Soft Computing, 9(2), 1920–1926. https://doi.org/10.21917/ijsc.2019.0267
-
Wedel, M., & Kannan, P. K. (2016). Marketing analytics for data-rich environments. Journal of Marketing, 80(6), 97–121. https://doi.org/10.1509/jm.15.0413
-
Yang, X., Zhang, G., Lu, J., & Ma, J. (2011). A kernel fuzzy c-means clustering-based fuzzy support vector machine algorithm. IEEE Transactions on Fuzzy Systems, 19(1), 105–119. https://doi.org/10.1109/TFUZZ.2010.2087382
-
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353. https://doi.org/10.1016/S0019-9958(65)90241-X
-
Zhang, S. (2022). Challenges in KNN classification. IEEE Transactions on Knowledge and Data Engineering, 34(9), 4663–4675. https://doi.org/10.1109/TKDE.2021.3049250