k-medoids Kümeleme, Boosting ve BERT algoritmalarıyla Kredi Kartlarındaki Şüpheli Davranışların Tespit Edilmesi

Merve Pınar; Ayşe Berna Altınel Girgin

doi:10.24012/dumf.1643370

Araştırma Makalesi

Detection of Suspicious Behaviors in Credit Cards Using K-Medoids Clustering, Boosting, and BERT Algorithms

Yıl 2025, Cilt: 16 Sayı: 2, 345 - 356, 30.06.2025

Merve Pınar , Ayşe Berna Altınel Girgin

https://doi.org/10.24012/dumf.1643370

https://izlik.org/JA77UJ48LS

Öz

The increasing complexity of financial fraud necessitates the early and accurate detection of suspicious behaviors in credit card transactions for robust financial security systems. In this study, a hybrid model integrating K-medoids-based unsupervised clustering with various supervised classification algorithms is proposed. The model partitions the dataset into subgroups to enable classifiers to learn from more homogeneous patterns, thereby enhancing the overall classification performance. The experimental setup utilizes a publicly available credit card dataset comprising 284,807 anonymized transactions collected from Europe, where feature anonymization is achieved through Principal Component Analysis (PCA). The impact of different training-to-test splits (30%, 40%, and 80%) on model performance is systematically evaluated. On the clustered data, a diverse set of classification models is applied, including Logistic Regression, Support Vector Machines, Artificial Neural Networks, Random Forest, XGBoost, Gradient Boosting, Hard and Soft Voting Ensembles, as well as BERT-based transformer models (BERT, XML-RoBERTa, DistilBERT, Electra). In addition, three supervised filter-based feature selection methods—Information Gain, Fisher Score, and Chi-Squared—are employed to assess their effects on classification performance. The experimental results indicate that the Gradient Boosting algorithm achieves the highest F1 score, reaching up to 98.26%, especially when combined with Fisher Score-based feature selection. Both Information Gain and Fisher Score techniques significantly enhance the classification performance by capturing inter-class discriminative power. However, the Chi-Squared method exhibits comparatively lower effectiveness due to its incompatibility with the transformed and continuous nature of the dataset. Overall, the findings demonstrate that clustering-assisted hybrid classification architectures provide superior accuracy and generalizability in high-dimensional, imbalanced datasets and offer a promising framework for developing intelligent fraud detection systems.

Anahtar Kelimeler

Machine Learning , Fraud Detection , Classification Algorithms , Suspicious Behavior Detection , Data Analysis

Kaynakça

[1] J. K. Afriyie et al., “A supervised machine learning algorithm for detecting and predicting fraud in credit card transactions,” Decision Analytics Journal, vol. 6, p. 100163, 2023.
[2] S. Singh, M. Kashyap, and N. Tantubay, “Comparative Analysis of ANN, RNN, and GRU for Credit Card Fraud Detection,” 2025.
[3] S. Ounacer et al., “Using Isolation Forest in anomaly detection: the case of credit card transactions,” Periodicals of Engineering and Natural Sciences (PEN), vol. 6, no. 2, pp. 394–400, 2018.
[4] S. Jiang et al., “Credit card fraud detection based on unsupervised attentional anomaly detection network,” Systems, vol. 11, no. 6, p. 305, 2023.
[5] O. R. Mohsen, G. Nassreddine, and M. Massoud, “Credit Card Fraud Detector Based on Machine Learning Techniques,” Journal of Computer Science and Technology Studies, vol. 5, no. 2, pp. 16–30, 2023.
[6] A. G. Oketola, T. Gbadebo-Ogunmefun, and A. Agbeja, “A Review of Credit Card Fraud Detection Using Machine Learning Algorithms,” 2023.
[7] K. Muameleci, “Anomaly Detection in Credit Card Transactions using Multivariate Generalized Pareto Distribution,” 2022.
[8] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit card fraud detection using machine learning techniques: A comparative analysis,” in Proc. Int. Conf. Computing Networking and Informatics (ICCNI), 2017, pp. 1–9.
[9] N. S. Alfaiz and S. M. Fati, “Enhanced credit card fraud detection model using machine learning,” Electronics, vol. 11, no. 4, p. 662, 2022.
[10] R. Setiawan et al., “Fraud Detection in Credit Card Transactions Using HDBSCAN, UMAP and SMOTE Methods,” Int. J. Sci. Technol. Manage., vol. 4, no. 5, pp. 1333–1339, 2023.
[11] F. K. Alarfaj et al., “Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms,” IEEE Access, vol. 10, pp. 39700–39715, 2022
[12] D. Theng and K. K. Bhoyar, “Feature selection techniques for machine learning: a survey of more than two decades of research,” Knowl. Inf. Syst., vol. 66, pp. 1575–1637, 2024. doi:10.1007/s10115-023-02010-5.
[13] M. A. Hall, Correlation-based feature selection for machine learning, Ph.D. dissertation, Univ. Waikato, 1999.
[14] Q. Gu, Z. Li, and H. Han, “Generalized Fisher Score for Feature Selection,” arXiv preprint, arXiv:1202.3725, 2012.
[15] H. Budak, “Özellik Seçim Yöntemleri ve Yeni Bir Yaklaşım,” Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 22, pp. 21–31, 2018.
[16] J. Han, “Spatial clustering methods in data mining: A survey,” in Geographic Data Mining and Knowledge Discovery, pp. 188–217, 2001.
[17] N. K. Kaur, U. Kaur, and D. Singh, “K-Medoid clustering algorithm-a review,” Int. J. Comput. Appl. Technol., vol. 1, no. 1, pp. 42–45, 2014.
[18] T. S. Madhulatha, “Comparison between k-means and k-medoids clustering algorithms,” in Proc. Int. Conf. Adv. Comput. Inf. Technol., Springer, Berlin, 2011.
[19] A. Entezami, H. Sarmadi, and B. S. Razavi, “An innovative hybrid strategy for structural health monitoring by modal flexibility and clustering methods,” J. Civil Struct. Health Monit., vol. 10, no. 5, pp. 845–859, 2020.
[20] M. P. LaValley, “Logistic regression,” Circulation, vol. 117, no. 18, pp. 2395–2399, 2008.
[21] A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network,” Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020.
[22] M. V. Koroteev, “BERT: a review of applications in natural language processing and understanding,” arXiv preprint, arXiv:2103.11943, 2021.
[23] A. Kara, “Global solar irradiance time series prediction using long short-term memory network,” Gazi Univ. J. Sci. Part C, vol. 4, no. 7, pp. 882–892, 2019.
[24] M. Lan et al., “Supervised and traditional term weighting methods for automatic text categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 721–735, 2008.
[25] K. Clark, “Electra: Pre-training text encoders as discriminators rather than generators,” arXiv preprint, arXiv:2003.10555, 2020.
[26] A. Conneau, “Unsupervised cross-lingual representation learning at scale,” arXiv preprint, arXiv:1911.02116, 2019.
[27] V. Sanh, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint, arXiv:1910.01108, 2019.
[28] S. Schweter, “BERTurk-BERT models for Turkish,” Zenodo, 2020. doi:10.5281/zenodo.3770924.
[29] P. Müller, “Flexible K nearest neighbors classifier: derivation and application for ion-mobility spectrometry-based indoor localization,” in Proc. Int. Conf. Indoor Positioning and Indoor Navigation (IPIN), IEEE, 2023.
[30] S. Xuan et al., “Random forest for credit card fraud detection,” in Proc. IEEE Int. Conf. Networking, Sensing and Control (ICNSC), 2018.
[31] P. Tomar, S. Shrivastava, and U. Thakar, “Ensemble learning based credit card fraud detection system,” in Proc. Conf. Information and Communication Technology (CICT), IEEE, 2021.
[32] M. A. Mim, N. Majadi, and P. Mazumder, “A soft voting ensemble learning approach for credit card fraud detection,” Heliyon, vol. 10, no. 3, 2024.
[33] A. E. Ezugwu et al., “A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects,” Eng. Appl. Artif. Intell., vol. 110, p. 104743, 2022.
[34] S. K. Çalışkan and İ. Soğukpınar, “Kxknn: K-means ve k en yakin komşu yöntemleri ile ağlarda nüfuz tespiti,” EMO Yayınları, pp. 120–124, 2008.
[35] A. Dal Pozzolo et al., “Credit card fraud detection: a realistic modeling and a novel learning strategy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8, pp. 3784–3797, 2017.
[36] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit card fraud detection using machine learning techniques: A comparative analysis,” in Proc. Int. Conf. Computing Networking and Informatics (ICCNI), IEEE, 2017, pp. 1–9.
[37] K. Muameleci, “Anomaly Detection in Credit Card Transactions using Multivariate Generalized Pareto Distribution,” 2022.
[38] F. K. Alarfaj et al., “Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms,” IEEE Access, vol. 10, pp. 39700–39715, 2022.

k-medoids Kümeleme, Boosting ve BERT algoritmalarıyla Kredi Kartlarındaki Şüpheli Davranışların Tespit Edilmesi

Yıl 2025, Cilt: 16 Sayı: 2, 345 - 356, 30.06.2025

Merve Pınar , Ayşe Berna Altınel Girgin

https://doi.org/10.24012/dumf.1643370

https://izlik.org/JA77UJ48LS

Öz

Finansal dolandırıcılıkların giderek artan karmaşıklığı karşısında, kredi kartı işlemlerinde şüpheli davranışların erken ve doğru şekilde tespiti, güvenlik sistemleri açısından kritik bir gereksinim haline gelmiştir. Bu çalışmada, K-medoids tabanlı kümeleme ile denetimli sınıflandırma algoritmalarının entegrasyonundan oluşan hibrit bir model önerilmiştir. Model, veri kümesini alt gruplara ayırarak sınıflandırıcıların daha homojen örüntüler üzerinde eğitilmesini sağlamakta ve genel sınıflandırma başarımını artırmayı hedeflemektedir. Çalışma kapsamında, Avrupa'da gerçekleşmiş 284.807 işlemden oluşan, PCA dönüşümleriyle anonimleştirilmiş bir kredi kartı veri seti kullanılmıştır. Eğitim verisi oranları %30, %40 ve %80 olarak belirlenmiş ve bu oranların model başarımı üzerindeki etkisi sistematik olarak incelenmiştir. K-medoids kümelenmiş veri üzerinde Lojistik Regresyon, Destek Vektör Makineleri, Yapay Sinir Ağları, Rastgele Orman, XGBoost, Gradyan Artırma, Hard/Soft Voting Ensemble ve BERT tabanlı modeller (BERT, XML-RoBERTa, DistilBERT, Electra) çalıştırılmıştır. Ayrıca, filtre tabanlı Bilgi Kazancı, Fisher Skor ve Ki-kare özellik seçimi tekniklerinin algoritma performansları üzerindeki etkisi detaylı şekilde analiz edilmiştir. Deneysel sonuçlar, özellikle Gradyan Artırma algoritmasının %98.26 F1 skoruna ulaşarak en yüksek başarıyı elde ettiğini göstermiştir. Bilgi Kazancı ve Fisher Skor yöntemleri sınıflar arası ayrımı daha etkili biçimde modelleyerek sınıflandırma performansını artırırken, Ki-kare yöntemi, veri setinin sürekli ve dönüşümlü doğası nedeniyle daha düşük doğruluk oranlarına ulaşmıştır. Elde edilen bulgular, kümeleme destekli hibrit sınıflandırma yaklaşımlarının, dengesiz dağılımlı yüksek boyutlu veri setleri üzerinde yüksek doğruluk ve genellenebilirlik sunduğunu ortaya koymakta ve sahtekarlık tespitine yönelik karar destek sistemlerinin geliştirilmesinde etkin bir çerçeve sunmaktadır.

Anahtar Kelimeler

Şüpheli Davranış Tespiti , Dolandırıcılık Tespiti , Boosting , BERT , k-medoids Kümeleme

Kaynakça

[1] J. K. Afriyie et al., “A supervised machine learning algorithm for detecting and predicting fraud in credit card transactions,” Decision Analytics Journal, vol. 6, p. 100163, 2023.
[2] S. Singh, M. Kashyap, and N. Tantubay, “Comparative Analysis of ANN, RNN, and GRU for Credit Card Fraud Detection,” 2025.
[3] S. Ounacer et al., “Using Isolation Forest in anomaly detection: the case of credit card transactions,” Periodicals of Engineering and Natural Sciences (PEN), vol. 6, no. 2, pp. 394–400, 2018.
[4] S. Jiang et al., “Credit card fraud detection based on unsupervised attentional anomaly detection network,” Systems, vol. 11, no. 6, p. 305, 2023.
[5] O. R. Mohsen, G. Nassreddine, and M. Massoud, “Credit Card Fraud Detector Based on Machine Learning Techniques,” Journal of Computer Science and Technology Studies, vol. 5, no. 2, pp. 16–30, 2023.
[6] A. G. Oketola, T. Gbadebo-Ogunmefun, and A. Agbeja, “A Review of Credit Card Fraud Detection Using Machine Learning Algorithms,” 2023.
[7] K. Muameleci, “Anomaly Detection in Credit Card Transactions using Multivariate Generalized Pareto Distribution,” 2022.
[8] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit card fraud detection using machine learning techniques: A comparative analysis,” in Proc. Int. Conf. Computing Networking and Informatics (ICCNI), 2017, pp. 1–9.
[9] N. S. Alfaiz and S. M. Fati, “Enhanced credit card fraud detection model using machine learning,” Electronics, vol. 11, no. 4, p. 662, 2022.
[10] R. Setiawan et al., “Fraud Detection in Credit Card Transactions Using HDBSCAN, UMAP and SMOTE Methods,” Int. J. Sci. Technol. Manage., vol. 4, no. 5, pp. 1333–1339, 2023.
[11] F. K. Alarfaj et al., “Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms,” IEEE Access, vol. 10, pp. 39700–39715, 2022
[12] D. Theng and K. K. Bhoyar, “Feature selection techniques for machine learning: a survey of more than two decades of research,” Knowl. Inf. Syst., vol. 66, pp. 1575–1637, 2024. doi:10.1007/s10115-023-02010-5.
[13] M. A. Hall, Correlation-based feature selection for machine learning, Ph.D. dissertation, Univ. Waikato, 1999.
[14] Q. Gu, Z. Li, and H. Han, “Generalized Fisher Score for Feature Selection,” arXiv preprint, arXiv:1202.3725, 2012.
[15] H. Budak, “Özellik Seçim Yöntemleri ve Yeni Bir Yaklaşım,” Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 22, pp. 21–31, 2018.
[16] J. Han, “Spatial clustering methods in data mining: A survey,” in Geographic Data Mining and Knowledge Discovery, pp. 188–217, 2001.
[17] N. K. Kaur, U. Kaur, and D. Singh, “K-Medoid clustering algorithm-a review,” Int. J. Comput. Appl. Technol., vol. 1, no. 1, pp. 42–45, 2014.
[18] T. S. Madhulatha, “Comparison between k-means and k-medoids clustering algorithms,” in Proc. Int. Conf. Adv. Comput. Inf. Technol., Springer, Berlin, 2011.
[19] A. Entezami, H. Sarmadi, and B. S. Razavi, “An innovative hybrid strategy for structural health monitoring by modal flexibility and clustering methods,” J. Civil Struct. Health Monit., vol. 10, no. 5, pp. 845–859, 2020.
[20] M. P. LaValley, “Logistic regression,” Circulation, vol. 117, no. 18, pp. 2395–2399, 2008.
[21] A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network,” Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020.
[22] M. V. Koroteev, “BERT: a review of applications in natural language processing and understanding,” arXiv preprint, arXiv:2103.11943, 2021.
[23] A. Kara, “Global solar irradiance time series prediction using long short-term memory network,” Gazi Univ. J. Sci. Part C, vol. 4, no. 7, pp. 882–892, 2019.
[24] M. Lan et al., “Supervised and traditional term weighting methods for automatic text categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 721–735, 2008.
[25] K. Clark, “Electra: Pre-training text encoders as discriminators rather than generators,” arXiv preprint, arXiv:2003.10555, 2020.
[26] A. Conneau, “Unsupervised cross-lingual representation learning at scale,” arXiv preprint, arXiv:1911.02116, 2019.
[27] V. Sanh, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint, arXiv:1910.01108, 2019.
[28] S. Schweter, “BERTurk-BERT models for Turkish,” Zenodo, 2020. doi:10.5281/zenodo.3770924.
[29] P. Müller, “Flexible K nearest neighbors classifier: derivation and application for ion-mobility spectrometry-based indoor localization,” in Proc. Int. Conf. Indoor Positioning and Indoor Navigation (IPIN), IEEE, 2023.
[30] S. Xuan et al., “Random forest for credit card fraud detection,” in Proc. IEEE Int. Conf. Networking, Sensing and Control (ICNSC), 2018.
[31] P. Tomar, S. Shrivastava, and U. Thakar, “Ensemble learning based credit card fraud detection system,” in Proc. Conf. Information and Communication Technology (CICT), IEEE, 2021.
[32] M. A. Mim, N. Majadi, and P. Mazumder, “A soft voting ensemble learning approach for credit card fraud detection,” Heliyon, vol. 10, no. 3, 2024.
[33] A. E. Ezugwu et al., “A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects,” Eng. Appl. Artif. Intell., vol. 110, p. 104743, 2022.
[34] S. K. Çalışkan and İ. Soğukpınar, “Kxknn: K-means ve k en yakin komşu yöntemleri ile ağlarda nüfuz tespiti,” EMO Yayınları, pp. 120–124, 2008.
[35] A. Dal Pozzolo et al., “Credit card fraud detection: a realistic modeling and a novel learning strategy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8, pp. 3784–3797, 2017.
[36] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit card fraud detection using machine learning techniques: A comparative analysis,” in Proc. Int. Conf. Computing Networking and Informatics (ICCNI), IEEE, 2017, pp. 1–9.
[37] K. Muameleci, “Anomaly Detection in Credit Card Transactions using Multivariate Generalized Pareto Distribution,” 2022.
[38] F. K. Alarfaj et al., “Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms,” IEEE Access, vol. 10, pp. 39700–39715, 2022.

Toplam 38 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Nöral Ağlar, Takviyeli Öğrenme
Bölüm	Araştırma Makalesi
Yazarlar	Merve Pınar 0000-0003-3041-6958 Ayşe Berna Altınel Girgin 0000-0001-5544-0925
Gönderilme Tarihi	20 Şubat 2025
Kabul Tarihi	24 Nisan 2025
Erken Görünüm Tarihi	30 Haziran 2025
Yayımlanma Tarihi	30 Haziran 2025
DOI	https://doi.org/10.24012/dumf.1643370
IZ	https://izlik.org/JA77UJ48LS
Yayımlandığı Sayı	Yıl 2025 Cilt: 16 Sayı: 2

Kaynak Göster

IEEE	[1]M. Pınar ve A. B. Altınel Girgin, “k-medoids Kümeleme, Boosting ve BERT algoritmalarıyla Kredi Kartlarındaki Şüpheli Davranışların Tespit Edilmesi”, DÜMF MD, c. 16, sy 2, ss. 345–356, Haz. 2025, doi: 10.24012/dumf.1643370.

Makale Dosyaları

Tam Metin

DUJE tarafından yayınlanan tüm makaleler, Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır. Bu, orijinal eser ve kaynağın uygun şekilde belirtilmesi koşuluyla, herkesin eseri kopyalamasına, yeniden dağıtmasına, yeniden düzenlemesine, iletmesine ve uyarlamasına izin verir. 24456