Research Article
BibTex RIS Cite

Regression Tree Approach to Estimation of Health Insurance Premium

Year 2023, Volume: 16 Issue: 2, 81 - 99, 31.12.2023

Abstract

This paper proposes an approach to predicting insurance premiums in health insurance by combining traditional generalized linear models (GLM) with advanced machine learning-driven regression tree analysis. The study first uses GLM on real complementary health insurance data to examine the importance of variables, focusing on those variables that have a large impact on premium estimates. Subsequently, it is investigated whether the variables identified as significant by GLM can also be identified as significant by regression tree analysis. In the application of machine learning, the effect of stratified sampling in accordance with the data structure in terms of the risk variables considered in premium forecasts is also analyzed. This study contributes to the actuarial understanding of premium estimation and provides insurers with a concrete framework to help them negotiate the complex world of health insurance data. By integrating the advantages of GLM and regression trees, this study provides a comprehensive comparison for insurers to adapt to changing risk factors. This study represents a innovative attempt to incorporate a regression tree methodology, providing a novel and accurate estimation of premium amounts in the realm of insurance analysis.

References

  • [1] P. McCullagh, J. A. Nelder,1989, Generalized Linear Models 2nd ed.. London: Chapman and Hall.
  • [2] A. E. Renshaw ,1991, Actuarial graduation practice and generalized linear and non-linear models. J Inst. Act., 118, 295-312.
  • [3] A. E. Renshaw, P. Verrall, 1994, A Stochastic Model Underlying The Chain Ladder Technique. In Proceedings of the XXV ASTIN Colloquium, Cannes.
  • [4] S. Haberman, A. E. Renshaw, 1996, Generalized Linear Models and Actuarial Science. Journal of the Royal Statistical Society. Series D The Statistician, 454, 407–436. https://doi.org/10.2307/2988543
  • [5] A. J. Dobson, 2002, An Introduction to Generalized Linear Models Second Edition. London: Chapman and Hall/CRC.
  • [6] D. Andersen, S. Feldblum, C. Modlin, D. Schirmacher, E. Schirmacher, N. Thandi, 2005, A Practitioner’s Guide to Generalized Linear Models Second Edition. CAS Study Note.
  • [7] K. Antonio, J. Beirlant, 2007, Actuarial statistics with generalized linear mixed models. Insurance Mathematics & Economics, 40, pp. 58-76. https://doi.org/10.1016/J.INSMATHECO.2006.02.013.
  • [8] P. De Jong, G. Heller, 2008, Generalized Linear Models for Insurance Data International Series on Actuarial Science. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511755408
  • [9] M. V. Wüthrich, M. Merz, 2008, Stochastic claims reserving methods in insurance. John Wiley & Sons.
  • [10] E. Ohlsson, B. Johansson, 2010, Non-Life Insurance Pricing with Generalized Linear Models. Springer.
  • [11] E. W. Frees, 2015, Analytics of insurance markets. Annual Review of Financial Economics, 7, 253–77
  • [12] Z. Quan, Insurance Analytics with Tree-Based Models. PhD thesis, University of Connecticut, 2019.
  • [13] W. Gardner, C. Lidz, E. Mulvey, E. C. Shaw, 1996, A comparison of actuarial methods for identifying repetitively violent patients with mental illnesses. Law and Human Behavior, 20, 35-48.
  • [14] H. Steadman, E. Silver, J. Monahan, P. Appelbaum, P. Robbins, E. Mulvey, T. Grisso, L. Roth, S. Banks, 2000, A Classification Tree Approach to the Development of Actuarial Violence Risk Assessment Tools. Law and Human Behavior, 24, 83-100. https://doi.org/10.1023/A:1005478820425.
  • [15] L. Guelman, 2012, Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Systems with Applications, 393, 3659–67.
  • [16] J. William, M. Martin, C. Chojenta, D. Loxton, 2018, An actuarial investigation into maternal hospital cost risk factors for public patients. Annals of Actuarial Science, 12, 106 - 129. https://doi.org/10.1017/S174849951700015X.
  • [17] M. V. Wuthrich, C. Buser, 2023, Data Analytics for Non-Life Insurance Pricing. Swiss Finance Institute Research Paper No. 16-68. Available at SSRN: https://ssrn.com/abstract=2870308 or http://dx.doi.org/10.2139/ssrn.2870308
  • [18] L. Diao, C. Weng, 2019, Regression Tree Credibility Model. North American Actuarial Journal, 232, 169-196. DOI: 10.1080/10920277.2018.1554497
  • [19] J. Baillargeon, L. Lamontagne, É. Marceau, 2020, Mining Actuarial Risk Predictors in Accident Descriptions Using Recurrent Neural Networks. Risks. https://doi.org/10.3390/risks9010007.
  • [20] S. Tober, 2020, Tree-based Machine Learning Models with Applications in Insurance Frequency Modelling Dissertation. Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276233
  • [21] R. Henckaerts, M.-P. Côté, K. Antonio, R. Verbelen, 2021, Boosting Insights in Insurance Tariff Plans with Tree-Based Machine Learning Methods. North American Actuarial Journal, 252, 255-285. DOI: 10.1080/10920277.2020.1745656
  • [22] B. Rokicki, K. Ostaszewski, 2022, Actuarial Credibility Approach in Adjusting Initial Cost Estimates of Transport Infrastructure Projects. Sustainability. https://doi.org/10.3390/su142013371.
  • [23] R. Richman, 2021a, AI in actuarial science—a review of recent advances—part 1. Ann. Actuar. Sci., 152, 207–29
  • [24] R. Richman, 2021b, AI in actuarial science—a review of recent advances—part 2. Ann. Actuar. Sci., 152, 230–58
  • [25] B. Wong, J. Christopher, H. Cossette, L. Lamontagne, E. Marceau, 2021, Machine Learning in P&C Insurance: A Review for Pricing and Reserving. Risks, 91, 4. https://doi.org/10.3390/risks9010004
  • [26] Z. Quan, 2019, Insurance Analytics with Tree-Based Models Doctoral Dissertations No. 2374. Retrieved from https://digitalcommons.lib.uconn.edu/dissertations/2374
  • [27] J. Neyman, 1934, On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558-625
  • [28] J. Neyman, E. S. Pearson, 1933, On the Problem of the Most Efficient Tests of Statistical Hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231, 289–337. http://www.jstor.org/stable/91247
  • [29] R. Singh, N. S. Mangat, 1996, Stratified Sampling. In: Elements of Survey Sampling, Vol. 15. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-1404-4_5
  • [30] V. L. Parsons, 2014, Stratified sampling. Wiley StatsRef: Statistics Reference Online, 1-11.
  • [31] E. Liberty, K. Lang, K. Shmakov, 2016, June. Stratified sampling meets machine learning. In International conference on machine learning pp. 2320-2329. PMLR.
  • [32] Y. Ye, Q. Wu, J. Z. Huang, M. K. Ng, X. Li, 2013, Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognition, 463, 769-787.
  • [33] T. Yu, X. Zhai, S. Sra, 2019, Near Optimal Stratified Sampling. ArXiv, abs/1906.11289.
  • [34] Y. Lu, Y. Park, L. Chen, Y. Wang, C. De Sa, D. Foster, 2021, July. Variance reduced training with stratified sampling for forecasting models. In International Conference on Machine Learning pp. 7145-7155. PMLR.
  • [35] J. Fox, 2008, Applied Regression Analysis and Generalized Linear Models, 2nd Edn. Thousand Oaks, CA: Sage.
  • [36] J.F. Magee, 1964, Decision trees for decision making, Harvard Business Review, pp. 126-138.
  • [37] S.K. Murthy, 1998, Automatic construction of decision trees from data: a multi-disciplinary survey. Data Mining Knowl Discovery 2(4):345–389
  • [38] R.L. Keeney, 1982, Decision Analysis: An Overview. Operations Research, 30(5).
  • [39] L.Tjen-Sien, L. Wei-Yin, S.Yu-Shan, 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40:203–228
  • [40] S.B. Kotsiantis, 2013, Decision trees: a recent overview. Artif Intell Rev 39, 261–283
  • [41 ] L. Breiman, J. Friedman, R. Olshen, C. J. Stone, 1984, Classification and regression Trees. Wadsworth, Belmont, CA.
  • [42] J.R. Quinlan, 1986, Induction of decision trees. Mach Learn 1, 81–106.
  • [43] J.R. Quinlan, 1993, C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
  • [44] G.V. Kass, 1980, "An Exploratory Technique for Investigating Large Quantities of Categorical Data". Applied Statistics. 29 (2): 119–127
  • [45] J. Gehrke, R. Ramakrishnan, V. Ganti, 2000, RainForest: a framework for fast decision tree construction of large datasets. Data Mining Knowl Discovery 4(2–3):127–162
  • [46] H. A. Chipman, E. I. George, R. E. McCulloch, 1998, Bayesian CART model search. Journal of the American Statistical Association, 93443, 935-960 pp.
  • [47] J. Morgan, 2014, Classification and regression tree analysis. Boston: Boston University, 298.
  • [48] D. L. Verbyla, 1987, Classification trees: a new discrimination tool. Canadian Journal of Forest Research, 17, 9, 1150–1152.
  • [49] L. A. Clark, D. Pregibon, 1992, Tree-based models. In: Statistical models Eds. Chambers JM, Hastie TJ. Pacific Grove, CA: Wadsworth, p 377–419.
  • [50] G. De’ath, K. E. Fabricius, 2000, Classification and Regression Trees: A Powerful yet Simple Technique for Ecological Data Analysis. Ecology, 81, 3178-3192
  • [51] R Core Team , 2021, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. RStudio 2023.09.1

Sağlık Sigortası Primi Tahmininde Regresyon Ağacı Yaklaşımı

Year 2023, Volume: 16 Issue: 2, 81 - 99, 31.12.2023

Abstract

Bu çalışma, geleneksel genelleştirilmiş doğrusal modelleri (GLM) gelişmiş makine öğrenimi odaklı regresyon ağacı analizi ile birleştirerek sağlık sigortasında sigorta primlerini tahmin etmeye yönelik bir yaklaşım önermektedir. Çalışmada ilk olarak değişkenlerin önemini incelemek için gerçek tamamlayıcı sağlık sigortası verileri üzerine GLM uygulanmakta ve prim tahminleri üzerinde büyük etkisi olan değişkenlere odaklanılmaktadır. Daha sonra, GLM tarafından önemli olarak tanımlanan değişkenlerin regresyon ağacı analizi ile de önemli olarak tanımlanıp tanımlanamayacağı araştırılmaktadır. Makine öğrenmesi uygulamasında, prim tahminlerinde dikkate alınan risk değişkenleri açısından veri yapısına uygun olarak tabakalı örneklemenin etkisi de analiz edilmektedir. Bu çalışma, prim tahminine ilişkin aktüeryal anlayışa katkıda bulunmakta ve sigortacılara sağlık sigortası verilerinin karmaşık dünyasında müzakere etmelerine yardımcı olacak somut bir çerçeve sunmaktadır. GLM ve regresyon ağaçlarının avantajlarını bir araya getiren bu çalışma, sigortacıların değişen risk faktörlerine uyum sağlamaları için kapsamlı bir karşılaştırma sunmakta ve sigorta analizi alanında prim tutarlarının yeni ve doğru bir şekilde tahmin edilmesini sağlayan bir regresyon ağacı metodolojisini içeren yenilikçi bir çalışmayı temsil etmektedir.

References

  • [1] P. McCullagh, J. A. Nelder,1989, Generalized Linear Models 2nd ed.. London: Chapman and Hall.
  • [2] A. E. Renshaw ,1991, Actuarial graduation practice and generalized linear and non-linear models. J Inst. Act., 118, 295-312.
  • [3] A. E. Renshaw, P. Verrall, 1994, A Stochastic Model Underlying The Chain Ladder Technique. In Proceedings of the XXV ASTIN Colloquium, Cannes.
  • [4] S. Haberman, A. E. Renshaw, 1996, Generalized Linear Models and Actuarial Science. Journal of the Royal Statistical Society. Series D The Statistician, 454, 407–436. https://doi.org/10.2307/2988543
  • [5] A. J. Dobson, 2002, An Introduction to Generalized Linear Models Second Edition. London: Chapman and Hall/CRC.
  • [6] D. Andersen, S. Feldblum, C. Modlin, D. Schirmacher, E. Schirmacher, N. Thandi, 2005, A Practitioner’s Guide to Generalized Linear Models Second Edition. CAS Study Note.
  • [7] K. Antonio, J. Beirlant, 2007, Actuarial statistics with generalized linear mixed models. Insurance Mathematics & Economics, 40, pp. 58-76. https://doi.org/10.1016/J.INSMATHECO.2006.02.013.
  • [8] P. De Jong, G. Heller, 2008, Generalized Linear Models for Insurance Data International Series on Actuarial Science. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511755408
  • [9] M. V. Wüthrich, M. Merz, 2008, Stochastic claims reserving methods in insurance. John Wiley & Sons.
  • [10] E. Ohlsson, B. Johansson, 2010, Non-Life Insurance Pricing with Generalized Linear Models. Springer.
  • [11] E. W. Frees, 2015, Analytics of insurance markets. Annual Review of Financial Economics, 7, 253–77
  • [12] Z. Quan, Insurance Analytics with Tree-Based Models. PhD thesis, University of Connecticut, 2019.
  • [13] W. Gardner, C. Lidz, E. Mulvey, E. C. Shaw, 1996, A comparison of actuarial methods for identifying repetitively violent patients with mental illnesses. Law and Human Behavior, 20, 35-48.
  • [14] H. Steadman, E. Silver, J. Monahan, P. Appelbaum, P. Robbins, E. Mulvey, T. Grisso, L. Roth, S. Banks, 2000, A Classification Tree Approach to the Development of Actuarial Violence Risk Assessment Tools. Law and Human Behavior, 24, 83-100. https://doi.org/10.1023/A:1005478820425.
  • [15] L. Guelman, 2012, Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Systems with Applications, 393, 3659–67.
  • [16] J. William, M. Martin, C. Chojenta, D. Loxton, 2018, An actuarial investigation into maternal hospital cost risk factors for public patients. Annals of Actuarial Science, 12, 106 - 129. https://doi.org/10.1017/S174849951700015X.
  • [17] M. V. Wuthrich, C. Buser, 2023, Data Analytics for Non-Life Insurance Pricing. Swiss Finance Institute Research Paper No. 16-68. Available at SSRN: https://ssrn.com/abstract=2870308 or http://dx.doi.org/10.2139/ssrn.2870308
  • [18] L. Diao, C. Weng, 2019, Regression Tree Credibility Model. North American Actuarial Journal, 232, 169-196. DOI: 10.1080/10920277.2018.1554497
  • [19] J. Baillargeon, L. Lamontagne, É. Marceau, 2020, Mining Actuarial Risk Predictors in Accident Descriptions Using Recurrent Neural Networks. Risks. https://doi.org/10.3390/risks9010007.
  • [20] S. Tober, 2020, Tree-based Machine Learning Models with Applications in Insurance Frequency Modelling Dissertation. Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276233
  • [21] R. Henckaerts, M.-P. Côté, K. Antonio, R. Verbelen, 2021, Boosting Insights in Insurance Tariff Plans with Tree-Based Machine Learning Methods. North American Actuarial Journal, 252, 255-285. DOI: 10.1080/10920277.2020.1745656
  • [22] B. Rokicki, K. Ostaszewski, 2022, Actuarial Credibility Approach in Adjusting Initial Cost Estimates of Transport Infrastructure Projects. Sustainability. https://doi.org/10.3390/su142013371.
  • [23] R. Richman, 2021a, AI in actuarial science—a review of recent advances—part 1. Ann. Actuar. Sci., 152, 207–29
  • [24] R. Richman, 2021b, AI in actuarial science—a review of recent advances—part 2. Ann. Actuar. Sci., 152, 230–58
  • [25] B. Wong, J. Christopher, H. Cossette, L. Lamontagne, E. Marceau, 2021, Machine Learning in P&C Insurance: A Review for Pricing and Reserving. Risks, 91, 4. https://doi.org/10.3390/risks9010004
  • [26] Z. Quan, 2019, Insurance Analytics with Tree-Based Models Doctoral Dissertations No. 2374. Retrieved from https://digitalcommons.lib.uconn.edu/dissertations/2374
  • [27] J. Neyman, 1934, On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558-625
  • [28] J. Neyman, E. S. Pearson, 1933, On the Problem of the Most Efficient Tests of Statistical Hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231, 289–337. http://www.jstor.org/stable/91247
  • [29] R. Singh, N. S. Mangat, 1996, Stratified Sampling. In: Elements of Survey Sampling, Vol. 15. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-1404-4_5
  • [30] V. L. Parsons, 2014, Stratified sampling. Wiley StatsRef: Statistics Reference Online, 1-11.
  • [31] E. Liberty, K. Lang, K. Shmakov, 2016, June. Stratified sampling meets machine learning. In International conference on machine learning pp. 2320-2329. PMLR.
  • [32] Y. Ye, Q. Wu, J. Z. Huang, M. K. Ng, X. Li, 2013, Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognition, 463, 769-787.
  • [33] T. Yu, X. Zhai, S. Sra, 2019, Near Optimal Stratified Sampling. ArXiv, abs/1906.11289.
  • [34] Y. Lu, Y. Park, L. Chen, Y. Wang, C. De Sa, D. Foster, 2021, July. Variance reduced training with stratified sampling for forecasting models. In International Conference on Machine Learning pp. 7145-7155. PMLR.
  • [35] J. Fox, 2008, Applied Regression Analysis and Generalized Linear Models, 2nd Edn. Thousand Oaks, CA: Sage.
  • [36] J.F. Magee, 1964, Decision trees for decision making, Harvard Business Review, pp. 126-138.
  • [37] S.K. Murthy, 1998, Automatic construction of decision trees from data: a multi-disciplinary survey. Data Mining Knowl Discovery 2(4):345–389
  • [38] R.L. Keeney, 1982, Decision Analysis: An Overview. Operations Research, 30(5).
  • [39] L.Tjen-Sien, L. Wei-Yin, S.Yu-Shan, 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40:203–228
  • [40] S.B. Kotsiantis, 2013, Decision trees: a recent overview. Artif Intell Rev 39, 261–283
  • [41 ] L. Breiman, J. Friedman, R. Olshen, C. J. Stone, 1984, Classification and regression Trees. Wadsworth, Belmont, CA.
  • [42] J.R. Quinlan, 1986, Induction of decision trees. Mach Learn 1, 81–106.
  • [43] J.R. Quinlan, 1993, C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
  • [44] G.V. Kass, 1980, "An Exploratory Technique for Investigating Large Quantities of Categorical Data". Applied Statistics. 29 (2): 119–127
  • [45] J. Gehrke, R. Ramakrishnan, V. Ganti, 2000, RainForest: a framework for fast decision tree construction of large datasets. Data Mining Knowl Discovery 4(2–3):127–162
  • [46] H. A. Chipman, E. I. George, R. E. McCulloch, 1998, Bayesian CART model search. Journal of the American Statistical Association, 93443, 935-960 pp.
  • [47] J. Morgan, 2014, Classification and regression tree analysis. Boston: Boston University, 298.
  • [48] D. L. Verbyla, 1987, Classification trees: a new discrimination tool. Canadian Journal of Forest Research, 17, 9, 1150–1152.
  • [49] L. A. Clark, D. Pregibon, 1992, Tree-based models. In: Statistical models Eds. Chambers JM, Hastie TJ. Pacific Grove, CA: Wadsworth, p 377–419.
  • [50] G. De’ath, K. E. Fabricius, 2000, Classification and Regression Trees: A Powerful yet Simple Technique for Ecological Data Analysis. Ecology, 81, 3178-3192
  • [51] R Core Team , 2021, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. RStudio 2023.09.1
There are 51 citations in total.

Details

Primary Language English
Subjects Statistical Analysis, Risk Analysis
Journal Section Articles
Authors

Başak Bulut Karageyik 0000-0003-4080-9165

Early Pub Date December 29, 2023
Publication Date December 31, 2023
Submission Date December 4, 2023
Acceptance Date December 28, 2023
Published in Issue Year 2023 Volume: 16 Issue: 2

Cite

IEEE B. Bulut Karageyik, “Regression Tree Approach to Estimation of Health Insurance Premium”, JSSA, vol. 16, no. 2, pp. 81–99, 2023.