Research Article
BibTex RIS Cite
Year 2023, Volume: 3 Issue: 1, 9 - 18, 30.06.2023

Abstract

References

  • [1] J. L. Moran, P. J. Solomon, A. R. Peisach, and J. Martin, “New models for old questions: generalized linear models for cost prediction,” Journal of evaluation in clinical practice, vol. 13, no. 3, pp. 381–389, 2007.
  • [2] B. Lahiri and N. Agarwal, “Predicting healthcare expenditure increase for an individual from medicare data,” in Proceedings of the ACM SIGKDD workshop on health informatics, 2014, pp. 73–79.
  • [3] P. C. Austin, V. Goel, and C. van Walraven, “An introduction to multilevel regression models,” Canadian journal of public health, vol. 92, no. 2, pp. 150–154, 2001.
  • [4] A. I. Taloba, A. El-Aziz, M. Rasha, H. M. Alshanbari, and A.-A. H. El-Bagoury, “Estimation and prediction of hospitalization and medical care costs using regression in machine learning,” Journal of Healthcare Engineering, vol. 2022, 2022.
  • [5] J. Iqbal, S. Hussain, H. AlSalman, M. A. Mosleh, S. Sajid Ullah et al., “A computational intelligence approach for predicting medical insurance cost,” Mathematical Problems in Engineering, vol. 2021, 2021.
  • [6] Y. Nomura, Y. Ishii, Y. Chiba, S. Suzuki, A. Suzuki, S. Suzuki, K. Morita, J. Tanabe, K. Yamakawa, Y. Ishiwata et al., “Does last year’s cost predict the present cost? an application of machine leaning for the japanese area-basis public health insurance database,” International journal of environmental research and public health, vol. 18, no. 2, p. 565, 2021.
  • [7] A. Vimont, H. Leleu, and I. Durand-Zaleski, “Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in france,” The European Journal of Health Economics, vol. 23, no. 2, pp. 211–223, 2022.
  • [8] S. Sushmita, S. Newman, J. Marquardt, P. Ram, V. Prasad, M. D. Cock, and A. Teredesai, “Population cost prediction on public healthcare datasets,” in Proceedings of the 5th International Conference on Digital Health 2015, 2015, pp. 87–94.
  • [9] A. Gavhane, G. Kokkula, I. Pandya, and K. Devadkar, “Prediction of heart disease using machine learning,” in 2018 second international conference on electronics, communication and aerospace technology (ICECA). IEEE, 2018, pp. 1275–1278.
  • [10] S. Salloum, R. Dautov, X. Chen, P. X. Peng, and J. Z. Huang, “Big data analytics on apache spark,” International Journal of Data Science and Analytics, vol. 1, no. 3, pp. 145–164, 2016.

Prediction of healthcare insurance costs

Year 2023, Volume: 3 Issue: 1, 9 - 18, 30.06.2023

Abstract

Machine learning (ML) is one of the computational intelligence aspects that can offer diverse solutions. Medical insurance cost prediction using ML methods is still a problem that must be investigated and improved in the healthcare industry. Two approaches are presented in this study the first uses computational intelligence to predict healthcare insurance costs using ML algorithms. And the second is Spark considered a big data tool. Among the first approach, the algorithms are the well-known linear regression and polynomial regression—based on the features of the input data. Linear regression is a method that shows the relationship between two or more variables. However, in polynomial analysis, the relationship between dependent and independent variables is modeled using polynomials of the nth degree. In this work, we use the KAGGLE repository to analyze the various regression models that can predict the cost of medical insurance. These data are divided based on essential features such as age, sex, BMI, region, number of children, smokers, and charges. The results show that the performance of the polynomial regression model is much better than the linear regression model. The polynomial regression model precisely fits the data according to the target. This is because the given task is non-linear which is hard for a linear model to predict the output as desired. Through the second approach, the data was built on a Jupyter notebook by interfacing tools to get the benefits that coding is very similar to python ML. Also, the cell could be closed, and usual ML coding is resumed on the same notebook. For this method, the obtained results show that the performance of the gradient-boosted tree regression model is much better than a multi-variate and random forest with R2 = 0.9067. This is because of its sequential technique of regression.

References

  • [1] J. L. Moran, P. J. Solomon, A. R. Peisach, and J. Martin, “New models for old questions: generalized linear models for cost prediction,” Journal of evaluation in clinical practice, vol. 13, no. 3, pp. 381–389, 2007.
  • [2] B. Lahiri and N. Agarwal, “Predicting healthcare expenditure increase for an individual from medicare data,” in Proceedings of the ACM SIGKDD workshop on health informatics, 2014, pp. 73–79.
  • [3] P. C. Austin, V. Goel, and C. van Walraven, “An introduction to multilevel regression models,” Canadian journal of public health, vol. 92, no. 2, pp. 150–154, 2001.
  • [4] A. I. Taloba, A. El-Aziz, M. Rasha, H. M. Alshanbari, and A.-A. H. El-Bagoury, “Estimation and prediction of hospitalization and medical care costs using regression in machine learning,” Journal of Healthcare Engineering, vol. 2022, 2022.
  • [5] J. Iqbal, S. Hussain, H. AlSalman, M. A. Mosleh, S. Sajid Ullah et al., “A computational intelligence approach for predicting medical insurance cost,” Mathematical Problems in Engineering, vol. 2021, 2021.
  • [6] Y. Nomura, Y. Ishii, Y. Chiba, S. Suzuki, A. Suzuki, S. Suzuki, K. Morita, J. Tanabe, K. Yamakawa, Y. Ishiwata et al., “Does last year’s cost predict the present cost? an application of machine leaning for the japanese area-basis public health insurance database,” International journal of environmental research and public health, vol. 18, no. 2, p. 565, 2021.
  • [7] A. Vimont, H. Leleu, and I. Durand-Zaleski, “Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in france,” The European Journal of Health Economics, vol. 23, no. 2, pp. 211–223, 2022.
  • [8] S. Sushmita, S. Newman, J. Marquardt, P. Ram, V. Prasad, M. D. Cock, and A. Teredesai, “Population cost prediction on public healthcare datasets,” in Proceedings of the 5th International Conference on Digital Health 2015, 2015, pp. 87–94.
  • [9] A. Gavhane, G. Kokkula, I. Pandya, and K. Devadkar, “Prediction of heart disease using machine learning,” in 2018 second international conference on electronics, communication and aerospace technology (ICECA). IEEE, 2018, pp. 1275–1278.
  • [10] S. Salloum, R. Dautov, X. Chen, P. X. Peng, and J. Z. Huang, “Big data analytics on apache spark,” International Journal of Data Science and Analytics, vol. 1, no. 3, pp. 145–164, 2016.
There are 10 citations in total.

Details

Primary Language English
Subjects Artificial Intelligence, Software Engineering (Other)
Journal Section Research Articles
Authors

Shoroog Albalawi This is me 0009-0003-5506-9363

Lama Alshahrani This is me 0009-0008-2661-0831

Nouf Albalawi This is me 0009-0007-3798-0339

Rawan Alharbi This is me 0000-0002-3044-1208

A'aeshah Alhakamy 0000-0002-0662-0185

Publication Date June 30, 2023
Acceptance Date February 20, 2023
Published in Issue Year 2023 Volume: 3 Issue: 1

Cite

Vancouver Albalawi S, Alshahrani L, Albalawi N, Alharbi R, Alhakamy A. Prediction of healthcare insurance costs. Computers and Informatics. 2023;3(1):9-18.