Research Article
BibTex RIS Cite

Makine Öğrenmesi Kullanarak Riskli Dolandırıcılık Tespiti: Sigorta Talepleri

Year 2024, Volume: 5 Issue: 1, 39 - 56, 30.04.2024

Abstract

Sigorta sektöründe dolandırıcılık önemli ve yaygın olarak kabul edilen bir sorundur. Sahte iddiaların sigortacılara önemli bir mali yük getirdiği göz önüne alındığında, meşru ve sahte iddialar arasında ayrım yapmak çok önemlidir. İlgili zaman ve maliyet nedeniyle her iddiayı manuel olarak incelemenin pratik olmadığı göz önüne alındığında, gelişmiş teknolojinin kullanılması zorunlu hale gelmektedir. Bu çalışmanın amacını, sigorta endüstrisinde dolandırıcılığı tespit etmek için makine öğrenimi algoritmalarıyla tahmin modellerinin kullanıldığı bir çerçeve oluşturmaktır. Çalışma için özel bir sigorta şirketinin hasar kayıtlarından bir veriseti hazırlanmıştır. Dolandırıcılık tespiti mekanizması geliştirmek için on bir tahmin modeli (Ada Boost, Cat Boost, Decision Tree, Extremely Randomized Tree, Gradient Boosting, KNN, LightGBM, Random Forest, Stochastic Gradient Boosting (SGB), Support Vector Classification (SVC) ve Voting Classifiers) uygulanmaktadır. Algoritmalar doğruluk değeri açısından karşılaştırılacak, en iyi değerleri veren algoritma belirlenecektir. Tüm metrikleri hesaplamak ve görüntülemek için GridSearchCV, Karmaşıklık Matrisi ve Sınıflandırma Raporu yöntemleri (Doğruluk, Kesinlik, Geri Çağırma ve F1-Puanı) kullanılmıştır. Bu çalışmanın sonucunda, Random Forest ve Decision Tree algoritmaları %75,6 ile en yüksek sınıflandırma doğruluğuna sahip olarak diğer modellerden daha iyi performans göstermiştir. Bu çalışmanın bulguları, sigorta sektöründe dolandırıcılık tespiti için faydalı ve temel çerçeve, sigorta sektöründe gerçek zamanlı problem çözme için bir işlevselliğe sahiptir.

Thanks

I would like to thanks everyone who contributed to the publication process, especially the referees and the editorial board.

References

  • Ali, A., Abd Razak, S., Othman, S. H., Eisa, T. A. E., Al-Dhaqm, A., Nasser, M., & Saif, A. (2022). Financial fraud detection based on machine learning: a systematic literature review. Applied Sciences, 12(19), 9637.
  • Au, T. C. (2018). Random forests, decision trees, and categorical predictors: the" absent levels" problem. The Journal of Machine Learning Research, 19(1), pp. 1737-1766.
  • Bandi, R., Likhit, M. S. S., Reddy, S. R., Bodla, S. R., & Venkat, V. S. (2023). Voting Classifier-Based Crop Recommendation. SN Computer Science, 4(5), 516. https://doi.org/10.1007/s42979-023-01995-8
  • Chakrabarty, N., Kundu, T., Dandapat, S., Sarkar, A., & Kole, D. K. (2019). Flight arrival delay prediction using gradient boosting classifier. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 2, pp. 651-659. https://doi.org/10.1007/978-981-13-1498-8_57
  • Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), pp. 20-28. https://doi.org/10.38094/jastt20165
  • Choi, J. M., Kim, J. H., & Kim, S. J. (2021). Application of Reinforcement Learning in Detecting Fraudulent Insurance Claims. International Journal of Computer Science & Network Security, 21(9), pp. 125-131.
  • Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In icml, Vol. 96, pp. 148-156.
  • Geren, Y. (2020). Makine Öğrenmesi ile Sigorta Hasarlarında Sahtecilik Tespiti. Turkish Studies-Information Technologies and Applied Sciences, 15(2), pp. 195-209.
  • Goetz, M., Weber, C., Bloecher, J., Stieltjes, B., Meinzer, H. P., & Maier-Hein, K. (2014). Extremely randomized trees based brain tumor segmentation. Proceeding of BRATS challenge-MICCAI, 14, pp. 6-11.
  • Hanafy, M. O. H. A. M. E. D., & Ming, R. (2021). Using machine learning models to compare various resampling methods in predicting insurance fraud. J. Theor. Appl. Inf. Technol, 99(12), pp. 2819-2833.
  • Hanafy, M., & Ming, R. (2022). Classification of the Insureds Using Integrated Machine Learning Algorithms: A Comparative Study. Applied Artificial Intelligence, 36(1), 2020489.
  • Insurance Fraud Information System (SISBIS) Statistics, (2024) https://siseb.sbm.org.tr/tr/istatistikler
  • Itri, B., Mohamed, Y., Omar, B., & Mohamed, Q. (2020). Empirical oversampling threshold strategy for machine learning performance optimisation in insurance fraud detection. International Journal of Advanced Computer Science and Applications, 11(10).
  • Jones, K. I., & Sah, S. (2023). The Implementation of Machine Learning In The Insurance Industry With Big Data Analytics. International Journal of Data Informatics and Intelligent Computing, 2(2), pp. 21-38. https://doi.org/10.59461/ijdiic.v2i2.47
  • Kalra, H., Singh, R., & Kumar, T. S. (2022). Fraud Claims Detection in Insurance Using Machine Learning. Journal of Pharmaceutical Negative Results, pp. 327-331.
  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30 (NIPS 2017), pp. 3146-3154.
  • Kramer, O. (2013). K-nearest neighbors. Dimensionality reduction with unsupervised nearest neighbors, pp. 13-23. https://doi.org/10.1007/978-3-642-38652-7_2
  • Liashchynskyi, P., & Liashchynskyi, P. (2019). Grid search, random search, genetic algorithm: a big comparison for NAS. arXiv preprint arXiv:1912.06059.
  • Liu, T., Jin, L., Zhong, C., & Xue, F. (2022). Study of thermal sensation prediction model based on support vector classification (SVC) algorithm with data preprocessing. Journal of Building Engineering, 48, 103919. https://doi.org/10.1016/j.jobe.2021.103919.
  • Muneer, A., & Fati, S. M. (2020). Efficient and automated herbs classification approach based on shape and texture features using deep learning. IEEE Access, 8, pp. 196747-196764.
  • Naseer, S., Fati, S. M., Muneer, A., & Ali, R. F. (2022). iAceS-Deep: Sequence-based identification of acetyl serine sites in proteins using PseAAC and deep neural representations. IEEE Access, 10, pp. 12953-12965.
  • Nordin, S. Z. S., Wah, Y. B., Haur, N. K., Hashim, A., Rambeli, N., & Jalil, N. A. (2024). Predicting automobile insurance fraud using classical and machine learning models. International Journal of Electrical and Computer Engineering (IJECE), 14(1), pp. 911-921.
  • Pranavi, P. S., Sheethal, H. D., Kumar, S. S., Kariappa, S., & Swathi, B. H. (2020). Analysis of Vehicle Insurance Data to Detect Fraud using Machine Learning. International Journal for Research in Applied Science & Engineering Technology (IJRASET), 8(7), pp. 2033-2038.
  • Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.
  • Sarang, P. (2023). Ensemble: Bagging and Boosting: Improving Decision Tree Performance by Ensemble Methods. In Thinking Data Science: A Data Science Practitioner’s Guide, pp. 97-129. https://doi.org/10.1007/978-3-031-02363-7_5
  • Sathya, M., & Balakumar, B. (2022). Insurance Fraud Detection Using Novel Machine Learning Technique. International Journal of Intelligent Systems and Applications in Engineering, 10(3), pp. 374-381.
  • Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006, December). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence, pp. 1015-1021. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Subudhi, S., & Panigrahi, S. (2020). Use of optimized Fuzzy C-Means clustering and supervised classifiers for automobile insurance fraud detection. Journal of King Saud University-Computer and Information Sciences, 32(5), pp. 568-575.
  • Ustimenko, A., & Prokhorenkova, L. (2021). SGLB: Stochastic gradient langevin boosting. In International Conference on Machine Learning, pp. 10487-10496.

Riskified Fraud Detection Using Machine Learning: Insurance Claims

Year 2024, Volume: 5 Issue: 1, 39 - 56, 30.04.2024

Abstract

In the insurance industry, fraud presents a significant and widely recognized challenge. With fraudulent claims posing a substantial financial burden on insurers, it's crucial to distinguish between legitimate and false claims. Given the impracticality of manually scrutinizing every claim due to the associated time and cost, employing advanced technology becomes imperative. This article delves into utilizing predictive models powered by machine learning algorithms to analyze claim data. For the study, a dataset was prepared from the damage records of a private insurance company. Eleven predictive models (Ada Boost, Cat Boost, Decision Tree, Extremely Randomized Tree, Gradient Boosting, KNN, LightGBM, Random Forest, Stochastic Gradient Boosting (SGB), Support Vector Classification (SVC), and Voting Classifiers) are applied for developing a fraud detection mechanism. Algorithms will be compared in terms of score the algorithm that gives the best values will be determined. GridSearchCV, Confusion Matrix and Classification Report methods (Accuracy, Precision, Recall, and F1-Score) of the used to calculate and display all metrics. As a result of this study, the Random Forest and Decision Tree Classifiers outperformed the other models with have the highest classification accuracy of 75.6%. The findings of this study are beneficial for fraud detection and the underlying framework holds a functionality for real-time problem-solving in the insurance sector.

References

  • Ali, A., Abd Razak, S., Othman, S. H., Eisa, T. A. E., Al-Dhaqm, A., Nasser, M., & Saif, A. (2022). Financial fraud detection based on machine learning: a systematic literature review. Applied Sciences, 12(19), 9637.
  • Au, T. C. (2018). Random forests, decision trees, and categorical predictors: the" absent levels" problem. The Journal of Machine Learning Research, 19(1), pp. 1737-1766.
  • Bandi, R., Likhit, M. S. S., Reddy, S. R., Bodla, S. R., & Venkat, V. S. (2023). Voting Classifier-Based Crop Recommendation. SN Computer Science, 4(5), 516. https://doi.org/10.1007/s42979-023-01995-8
  • Chakrabarty, N., Kundu, T., Dandapat, S., Sarkar, A., & Kole, D. K. (2019). Flight arrival delay prediction using gradient boosting classifier. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 2, pp. 651-659. https://doi.org/10.1007/978-981-13-1498-8_57
  • Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), pp. 20-28. https://doi.org/10.38094/jastt20165
  • Choi, J. M., Kim, J. H., & Kim, S. J. (2021). Application of Reinforcement Learning in Detecting Fraudulent Insurance Claims. International Journal of Computer Science & Network Security, 21(9), pp. 125-131.
  • Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In icml, Vol. 96, pp. 148-156.
  • Geren, Y. (2020). Makine Öğrenmesi ile Sigorta Hasarlarında Sahtecilik Tespiti. Turkish Studies-Information Technologies and Applied Sciences, 15(2), pp. 195-209.
  • Goetz, M., Weber, C., Bloecher, J., Stieltjes, B., Meinzer, H. P., & Maier-Hein, K. (2014). Extremely randomized trees based brain tumor segmentation. Proceeding of BRATS challenge-MICCAI, 14, pp. 6-11.
  • Hanafy, M. O. H. A. M. E. D., & Ming, R. (2021). Using machine learning models to compare various resampling methods in predicting insurance fraud. J. Theor. Appl. Inf. Technol, 99(12), pp. 2819-2833.
  • Hanafy, M., & Ming, R. (2022). Classification of the Insureds Using Integrated Machine Learning Algorithms: A Comparative Study. Applied Artificial Intelligence, 36(1), 2020489.
  • Insurance Fraud Information System (SISBIS) Statistics, (2024) https://siseb.sbm.org.tr/tr/istatistikler
  • Itri, B., Mohamed, Y., Omar, B., & Mohamed, Q. (2020). Empirical oversampling threshold strategy for machine learning performance optimisation in insurance fraud detection. International Journal of Advanced Computer Science and Applications, 11(10).
  • Jones, K. I., & Sah, S. (2023). The Implementation of Machine Learning In The Insurance Industry With Big Data Analytics. International Journal of Data Informatics and Intelligent Computing, 2(2), pp. 21-38. https://doi.org/10.59461/ijdiic.v2i2.47
  • Kalra, H., Singh, R., & Kumar, T. S. (2022). Fraud Claims Detection in Insurance Using Machine Learning. Journal of Pharmaceutical Negative Results, pp. 327-331.
  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30 (NIPS 2017), pp. 3146-3154.
  • Kramer, O. (2013). K-nearest neighbors. Dimensionality reduction with unsupervised nearest neighbors, pp. 13-23. https://doi.org/10.1007/978-3-642-38652-7_2
  • Liashchynskyi, P., & Liashchynskyi, P. (2019). Grid search, random search, genetic algorithm: a big comparison for NAS. arXiv preprint arXiv:1912.06059.
  • Liu, T., Jin, L., Zhong, C., & Xue, F. (2022). Study of thermal sensation prediction model based on support vector classification (SVC) algorithm with data preprocessing. Journal of Building Engineering, 48, 103919. https://doi.org/10.1016/j.jobe.2021.103919.
  • Muneer, A., & Fati, S. M. (2020). Efficient and automated herbs classification approach based on shape and texture features using deep learning. IEEE Access, 8, pp. 196747-196764.
  • Naseer, S., Fati, S. M., Muneer, A., & Ali, R. F. (2022). iAceS-Deep: Sequence-based identification of acetyl serine sites in proteins using PseAAC and deep neural representations. IEEE Access, 10, pp. 12953-12965.
  • Nordin, S. Z. S., Wah, Y. B., Haur, N. K., Hashim, A., Rambeli, N., & Jalil, N. A. (2024). Predicting automobile insurance fraud using classical and machine learning models. International Journal of Electrical and Computer Engineering (IJECE), 14(1), pp. 911-921.
  • Pranavi, P. S., Sheethal, H. D., Kumar, S. S., Kariappa, S., & Swathi, B. H. (2020). Analysis of Vehicle Insurance Data to Detect Fraud using Machine Learning. International Journal for Research in Applied Science & Engineering Technology (IJRASET), 8(7), pp. 2033-2038.
  • Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.
  • Sarang, P. (2023). Ensemble: Bagging and Boosting: Improving Decision Tree Performance by Ensemble Methods. In Thinking Data Science: A Data Science Practitioner’s Guide, pp. 97-129. https://doi.org/10.1007/978-3-031-02363-7_5
  • Sathya, M., & Balakumar, B. (2022). Insurance Fraud Detection Using Novel Machine Learning Technique. International Journal of Intelligent Systems and Applications in Engineering, 10(3), pp. 374-381.
  • Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006, December). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence, pp. 1015-1021. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Subudhi, S., & Panigrahi, S. (2020). Use of optimized Fuzzy C-Means clustering and supervised classifiers for automobile insurance fraud detection. Journal of King Saud University-Computer and Information Sciences, 32(5), pp. 568-575.
  • Ustimenko, A., & Prokhorenkova, L. (2021). SGLB: Stochastic gradient langevin boosting. In International Conference on Machine Learning, pp. 10487-10496.
There are 29 citations in total.

Details

Primary Language English
Subjects Banking and Insurance (Other)
Journal Section Articles
Authors

Hakan Kaya 0000-0002-0812-4839

Publication Date April 30, 2024
Submission Date February 9, 2024
Acceptance Date April 2, 2024
Published in Issue Year 2024 Volume: 5 Issue: 1

Cite

APA Kaya, H. (2024). Riskified Fraud Detection Using Machine Learning: Insurance Claims. Malatya Turgut Özal Üniversitesi İşletme Ve Yönetim Bilimleri Dergisi, 5(1), 39-56.