Research Article
BibTex RIS Cite
Year 2024, Issue: 059, 58 - 69, 31.12.2024

Abstract

References

  • [1] M. V. Wuthrich and C. Buser, "Data analytics for non-life insurance pricing," Swiss Finance Institute Research Paper, no. 16-68, 2023.
  • [2] Y. Liu, B.-J. Wang, and S.-G. Lv, "Using multi-class AdaBoost tree for prediction frequency of auto insurance," Journal of Applied Finance and Banking, vol. 4, no. 5, p. 45, 2014.
  • [3] A. Dal Pozzolo, G. Moro, G. Bontempi, and D. Y. A. Le Borgne, "Comparison of data mining techniques for insurance claim prediction," Universita degli Studi di Bologna, 2011.
  • [4] A. Noll, R. Salzmann, and M. V. Wuthrich, "Case study: French motor third-party liability claims," Available at SSRN 3164764, 2020.
  • [5] C. Clemente, G. R. Guerreiro, and J. M. Bravo, "Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting," Risks, vol. 11, no. 9, p. 163, 2023.
  • [6] A. Ferrario and R. Hämmerli, "On boosting: Theory and applications," Available at SSRN 3402687, 2019.
  • [7] L. Guelman, "Gradient boosting trees for auto insurance loss cost modeling and prediction," Expert Systems with Applications, vol. 39, no. 3, pp. 3659-3667, 2012.
  • [8] R. Henckaerts, M.-P. Côté, K. Antonio, and R. Verbelen, "Boosting insights in insurance tariff plans with tree-based machine learning methods," North American Actuarial Journal, vol. 25, no. 2, pp. 255-285, 2021.
  • [9] B. So, "Enhanced gradient boosting for zero-inflated insurance claims and comparative analysis of CatBoost, XGBoost, and LightGBM," Scandinavian Actuarial Journal, pp. 1-23, 2024.
  • [10] P. Zhang, D. Pitt, and X. Wu, "A new multivariate zero-inflated hurdle model with applications in automobile insurance," ASTIN Bulletin: The Journal of the IAA, vol. 52, no. 2, pp. 393-416, 2022.
  • [11] I. N. El-Saeiti and G. Alomair, "Comparative Evaluation of Zero-Inflated and Hurdle Models for Balanced and Unbalanced Data: Performance Assessment and Model Fit Analysis," European Journal of Science, Innovation and Technology, vol. 3, no. 6, pp. 192-199, 2023.
  • [12] K. C. Yip and K. K. Yau, "On modeling claim frequency data in general insurance with extra zeros," Insurance: Mathematics and Economics, vol. 36, no. 2, pp. 153-163, 2005.
  • [13] J.-P. Boucher, M. Denuit, and M. Guillén, "Risk classification for claim counts: a comparative analysis of various zeroinflated mixed Poisson and hurdle models," North American Actuarial Journal, vol. 11, no. 4, pp. 110-131, 2007.
  • [14] J. A. Nelder and R. W. Wedderburn, "Generalized linear models," Journal of the Royal Statistical Society Series A: Statistics in Society, vol. 135, no. 3, pp. 370-384, 1972.
  • [15] J. Mullahy, "Specification and testing of some modified count data models," Journal of econometrics, vol. 33, no. 3, pp. 341-365, 1986.
  • [16] A. Zeileis, C. Kleiber, and S. Jackman, "Regression models for count data in R," Journal of statistical software, vol. 27, no. 8, pp. 1-25, 2008.
  • [17] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, "The elements of statistical learning: data mining, inference, and prediction". Springer, 2009.
  • [18] J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of statistics, pp. 1189-1232, 2001.
  • [19] M. V. Wüthrich and M. Merz, "Statistical foundations of actuarial learning and its applications". Springer Nature, 2023.
  • [20] M. V. Wüthrich and M. Merz, "Yes, we CANN!," ASTIN Bulletin: The Journal of the IAA, vol. 49, no. 1, pp. 1-3, 2019.
  • [21] S. C. Lee, "Addressing imbalanced insurance data through zero-inflated Poisson regression with boosting," ASTIN Bulletin: The Journal of the IAA, vol. 51, no. 1, pp. 27-55, 2021.

Comparison of regression and tree-based methods for the prediction of zero-inflated claim data

Year 2024, Issue: 059, 58 - 69, 31.12.2024

Abstract

Pricing non-life insurance products is based on the prediction of two components; claim frequency and claim severity. In this study we focus on claim frequency data that has a zero-inflated structure. Although zero-modified regression models such as zero-inflated and hurdle models are used for data sets with excess zeros, machine learning (ML) methods are also preferred for this type of data sets in recent years. When the objective is the prediction, ML methods generally provide more accurate results than regression models especially for large and complex datasets. Tree-based ML methods run decision trees as the base of the algorithm and improve performance by using the predictions of multiple trees. Combining the traditional methods with ML methods is a current popular approach for prediction tasks. Objective of this study is to compare the predictive performance of regression methods and tree-based ML methods for zero-inflated claim frequency data using a real insurance dataset. Motor third party liability insurance claim data from an insurance company in Turkey is used for the case study. To predict claim frequency, generalized linear models (GLM), zero-inflated model and hurdle model are used under Poisson distribution as regression models and regression trees, boosting and GLM-Boost that is a combination of GLM and Boosting algorithm are used as ML methods. Predictive performances of candidate models are compared using both average in-sample and average out-of-sample losses. According to the case study results, ML methods performed better predictive performance than zero-modified models. Specially, GLM-Boost method performed best among others and that is a promising result for the approaches that are combinations of GLM and ML methods.

References

  • [1] M. V. Wuthrich and C. Buser, "Data analytics for non-life insurance pricing," Swiss Finance Institute Research Paper, no. 16-68, 2023.
  • [2] Y. Liu, B.-J. Wang, and S.-G. Lv, "Using multi-class AdaBoost tree for prediction frequency of auto insurance," Journal of Applied Finance and Banking, vol. 4, no. 5, p. 45, 2014.
  • [3] A. Dal Pozzolo, G. Moro, G. Bontempi, and D. Y. A. Le Borgne, "Comparison of data mining techniques for insurance claim prediction," Universita degli Studi di Bologna, 2011.
  • [4] A. Noll, R. Salzmann, and M. V. Wuthrich, "Case study: French motor third-party liability claims," Available at SSRN 3164764, 2020.
  • [5] C. Clemente, G. R. Guerreiro, and J. M. Bravo, "Modelling Motor Insurance Claim Frequency and Severity Using Gradient Boosting," Risks, vol. 11, no. 9, p. 163, 2023.
  • [6] A. Ferrario and R. Hämmerli, "On boosting: Theory and applications," Available at SSRN 3402687, 2019.
  • [7] L. Guelman, "Gradient boosting trees for auto insurance loss cost modeling and prediction," Expert Systems with Applications, vol. 39, no. 3, pp. 3659-3667, 2012.
  • [8] R. Henckaerts, M.-P. Côté, K. Antonio, and R. Verbelen, "Boosting insights in insurance tariff plans with tree-based machine learning methods," North American Actuarial Journal, vol. 25, no. 2, pp. 255-285, 2021.
  • [9] B. So, "Enhanced gradient boosting for zero-inflated insurance claims and comparative analysis of CatBoost, XGBoost, and LightGBM," Scandinavian Actuarial Journal, pp. 1-23, 2024.
  • [10] P. Zhang, D. Pitt, and X. Wu, "A new multivariate zero-inflated hurdle model with applications in automobile insurance," ASTIN Bulletin: The Journal of the IAA, vol. 52, no. 2, pp. 393-416, 2022.
  • [11] I. N. El-Saeiti and G. Alomair, "Comparative Evaluation of Zero-Inflated and Hurdle Models for Balanced and Unbalanced Data: Performance Assessment and Model Fit Analysis," European Journal of Science, Innovation and Technology, vol. 3, no. 6, pp. 192-199, 2023.
  • [12] K. C. Yip and K. K. Yau, "On modeling claim frequency data in general insurance with extra zeros," Insurance: Mathematics and Economics, vol. 36, no. 2, pp. 153-163, 2005.
  • [13] J.-P. Boucher, M. Denuit, and M. Guillén, "Risk classification for claim counts: a comparative analysis of various zeroinflated mixed Poisson and hurdle models," North American Actuarial Journal, vol. 11, no. 4, pp. 110-131, 2007.
  • [14] J. A. Nelder and R. W. Wedderburn, "Generalized linear models," Journal of the Royal Statistical Society Series A: Statistics in Society, vol. 135, no. 3, pp. 370-384, 1972.
  • [15] J. Mullahy, "Specification and testing of some modified count data models," Journal of econometrics, vol. 33, no. 3, pp. 341-365, 1986.
  • [16] A. Zeileis, C. Kleiber, and S. Jackman, "Regression models for count data in R," Journal of statistical software, vol. 27, no. 8, pp. 1-25, 2008.
  • [17] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, "The elements of statistical learning: data mining, inference, and prediction". Springer, 2009.
  • [18] J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of statistics, pp. 1189-1232, 2001.
  • [19] M. V. Wüthrich and M. Merz, "Statistical foundations of actuarial learning and its applications". Springer Nature, 2023.
  • [20] M. V. Wüthrich and M. Merz, "Yes, we CANN!," ASTIN Bulletin: The Journal of the IAA, vol. 49, no. 1, pp. 1-3, 2019.
  • [21] S. C. Lee, "Addressing imbalanced insurance data through zero-inflated Poisson regression with boosting," ASTIN Bulletin: The Journal of the IAA, vol. 51, no. 1, pp. 27-55, 2021.
There are 21 citations in total.

Details

Primary Language English
Subjects Statistical Data Science, Risk Analysis, Applied Statistics
Journal Section Research Articles
Authors

Aslıhan Şentürk Acar 0000-0002-1708-2028

Publication Date December 31, 2024
Submission Date September 2, 2024
Acceptance Date December 25, 2024
Published in Issue Year 2024 Issue: 059

Cite

IEEE A. Şentürk Acar, “Comparison of regression and tree-based methods for the prediction of zero-inflated claim data”, JSR-A, no. 059, pp. 58–69, December 2024.