Research Article
BibTex RIS Cite

A predictive machine learning framework for diabetes

Year 2024, Volume: 8 Issue: 3, 583 - 592, 28.07.2024
https://doi.org/10.31127/tuje.1434305

Abstract

Diabetes, a non-communicable disease, is associated with a condition indicative of too much glucose in the bloodstream. In the year 2022, it was estimated that about 422 million were living with the disease globally. The impact of diabetes on the world economy was estimated at $ 1.31 trillion in the year 2015 and implicated in the death of 5 million adults between the ages of 20 and 79 years globally. If left untreated for an extended time, could result in a host of other health complications. The need for predictive models to supplement the diagnostic process and aid the early detection of diabetes is therefore important. The current study is an effort geared toward developing a machine learning framework for the prediction of diabetes, expected to aid medical practitioners in the early detection of the disease. The dataset used in this investigation was sourced from the Kaggle database. The dataset consists of 100,000 entries, with 8,500 diabetics and 91,500 non-diabetics, indicating an imbalanced dataset. The dataset was modified to achieve a more balanced dataset consisting of 8,500 entries each for the diabetic and non-diabetic classes. Gradient Boosting classifier (GBC), Adaptive Boosting classifier (ADA), and Light Gradient Boosting Machine (LGBM) were the best three performing classifiers after comparing fifteen classifiers. The proposed framework is a stack model consisting of GBC, ADA, and LGBM. The ADA classifier was utilized as the meta-model. This model achieved an average accuracy, area under the curve (AUC), recall, precision, and f1-score of 91.12 ± 0.75 %, 97.83 ± 0.29 %, 92.03 ± 1.55 %, 90.40 ± 1.01 %, and 91.12 ± 0.77 %, respectively. The selling point of the proposed framework is the high recall of 92.03 ± 1.55 %, indicating that the model is sensitive to both the diabetic and the non-diabetic classes.

Ethical Statement

Not Applicable

Supporting Institution

Obafemi Awolowo University

Thanks

Thanks

References

  • WHO. (2023). Diabetes, Diabetes Report. https://www.who.int/health-topics/diabetes#tab=tab_1
  • IDF (2021). Facts & figures. https://idf.org/about-diabetes/diabetes-facts-figures/
  • Woldaregay, A. Z., Årsand, E., Botsis, T., Albers, D., Mamykina, L., & Hartvigsen, G. (2019). Data-driven blood glucose pattern classification and anomalies detection: machine-learning applications in type 1 diabetes. Journal of medical Internet research, 21(5), e11030. https://doi.org/10.2196/11030
  • Sabitha, E., & Durgadevi, M. (2022). Improving the diabetes Diagnosis prediction rate using data preprocessing, data augmentation and recursive feature elimination method. International Journal of Advanced Computer Science and Applications, 13(9), 921-930. https://doi.org/10.14569/IJACSA.2022.01309107
  • Choubey, S., Agrahari, S., Shaw, A., Dhar, S., Sarma, R. R., Singh, S. K., Das, P., & Saha, B. (2023). Diabetes Prediction Using ML. International Journal for Research in Applied Science and Engineering Technology, 11(6), 4209-4212. https://doi.org/10.22214/ijraset.2023.54415
  • Marcovecchio, M. L. (2017). Complications of acute and chronic hyperglycemia. US Endocrinol, 13(1), 17-21. https://doi.org/10.17925/USE.2017.13.01.17
  • El_Jerjawi, N. S., & Abu-Naser, S. S. (2018). Diabetes prediction using artificial neural network. International Journal of Advanced Science and Technology, 121, 54-64. http://dx.doi.org/10.14257/ijast.2018.121.05
  • Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8, 76516-76531. https://doi.org/10.1109/ACCESS.2020.2989857
  • Temurtas, H., Yumusak, N., & Temurtas, F. (2009). A comparative study on diabetes disease diagnosis using neural networks. Expert Systems with Applications, 36(4), 8610-8615. https://doi.org/10.1016/j.eswa.2008.10.032
  • Bashir, M., Naem, E., Taha, F., Konje, J. C., & Abou-Samra, A. B. (2019). Outcomes of type 1 diabetes mellitus in pregnancy; effect of excessive gestational weight gain and hyperglycaemia on fetal growth. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 13(1), 84-88. https://doi.org/10.1016/j.dsx.2018.08.030
  • Hammer, M., Storey, S., Hershey, D. S., Brady, V. J., Davis, E., Mandolfo, N., Bryant, A. L., & Olausson, J. (2019). Hyperglycemia and Cancer: A State-of-the-Science Review. Oncology Nursing Forum, 46(4), 459-472. https://doi.org/10.1188/19.ONF.459-472
  • Storey, S., Von Ah, D., & Hammer, M. (2017). Measurement of hyperglycemia and impact on the health outcomes in people with cancer: challenges and opportunities. Oncology Nursing Forum, 44(4), E141. https://doi.org/10.1188/17.ONF.E141-E151
  • Griffin, S. J., Little, P. S., Hales, C. N., Kinmonth, A. L., & Wareham, N. J. (2000). Diabetes risk score: towards earlier detection of type 2 diabetes in general practice. Diabetes/metabolism Research and Reviews, 16(3), 164-171. https://doi.org/10.1002/1520-7560(200005/06)16:3<164::AID-DMRR103>3.0.CO;2-R
  • Park, P. J., Griffin, S. J., Sargeant, L., & Wareham, N. J. (2002). The performance of a risk score in predicting undiagnosed hyperglycemia. Diabetes Care, 25(6), 984-988. https://doi.org/10.2337/diacare.25.6.984
  • Lindstrom, J., & Tuomilehto, J. (2003). The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care, 26(3), 725-731. https://doi.org/10.2337/diacare.26.3.725
  • Heikes, K. E., Eddy, D. M., Arondekar, B., & Schlessinger, L. (2008). Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care, 31(5), 1040-1045. https://doi.org/10.2337/dc07-1150
  • Stern, M. P., Williams, K., & Haffner, S. M. (2002). Identification of persons at high risk for type 2 diabetes mellitus: do we need the oral glucose tolerance test?. Annals of Internal Medicine, 136(8), 575-581. https://doi.org/10.7326/0003-4819-136-8-200204160-00006
  • Kodama, S., Fujihara, K., Horikawa, C., Kitazawa, M., Iwanaga, M., Kato, K., ... & Sone, H. (2022). Predictive ability of current machine learning algorithms for type 2 diabetes mellitus: A meta‐analysis. Journal of Diabetes Investigation, 13(5), 900-908. https://doi.org/10.1111/jdi.13736
  • Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal, 15, 104-116. https://doi.org/10.1016/j.csbj.2016.12.005
  • Nai-Arun, N., & Moungmai, R. (2015). Comparison of classifiers for the risk of diabetes prediction. Procedia Computer Science, 69, 132-142. https://doi.org/10.1016/j.procs.2015.10.014
  • Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, 106773. https://doi.org/10.1016/j.cmpb.2022.106773
  • Singh, A., Halgamuge, M. N., & Lakshmiganthan, R. (2017). Impact of different data types on classifier performance of random forest, naive bayes, and k-nearest neighbors algorithms. International Journal of Advanced Computer Science and Applications, 8(12), 1-10.
  • Tejedor, M., Woldaregay, A. Z., & Godtliebsen, F. (2020). Reinforcement learning application in diabetes blood glucose control: A systematic review. https://doi.org/10.1016/j.artmed.2020.101836
  • Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89-109. https://doi.org/10.1016/S0933-3657(01)00077-X
  • Asfaw, T. A. (2019). Prediction of diabetes mellitus using machine learning techniques. International Journal of Computer Engineering and Technology, 10(4), 145-148. https://doi.org/10.34218/ijcet.10.4.2019.004
  • Yu, W., Liu, T., Valdez, R., Gwinn, M., & Khoury, M. J. (2010). Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Medical Informatics and Decision Making, 10, 1-7. https://doi.org/10.1186/1472-6947-10-16
  • MacMahon, H., Naidich, D. P., Goo, J. M., Lee, K. S., Leung, A. N., Mayo, J. R., ... & Bankier, A. A. (2017). Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017. Radiology, 284(1), 228-243. https://doi.org/10.1148/radiol.2017161659
  • Maniruzzaman, M., Rahman, M. J., Al-MehediHasan, M., Suri, H. S., Abedin, M. M., El-Baz, A., & Suri, J. S. (2018). Accurate diabetes risk stratification using machine learning: role of missing value and outliers. Journal of Medical Systems, 42, 92. https://doi.org/10.1007/s10916-018-0940-7
  • Ahuja, R., Sharma, S. C., & Ali, M. (2019). A diabetic disease prediction model based on classification algorithms. Annals of Emerging Technologies in Computing (AETiC), 3(3), 44-52. https://doi.org/10.33166/AETiC.2019.03.005
  • Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine learning based diabetes classification and prediction for healthcare applications. Journal of Healthcare Engineering, 2021(1), 9930985. https://doi.org/10.1155/2021/9930985
  • Roy, K., Ahmad, M., Waqar, K., Priyaah, K., Nebhen, J., Alshamrani, S. S., ... & Ali, I. (2021). An enhanced machine learning framework for type 2 diabetes classification using imbalanced data with missing values. Complexity, 2021(1), 9953314. https://doi.org/10.1155/2021/9953314
  • Muhammad, L. J., Algehyne, E. A., & Usman, S. S. (2020). Predictive supervised machine learning models for diabetes mellitus. SN Computer Science, 1(5), 240. https://doi.org/10.1007/s42979-020-00250-8
  • Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC Endocrine Disorders, 19, 1-9. https://doi.org/10.1186/s12902-019-0436-6
  • Abnoosian, K., Farnoosh, R., & Behzadi, M. H. (2023). Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinformatics, 24(1), 337. https://doi.org/10.1186/s12859-023-05465-z
  • Mustafa, M. (2023). A Comprehensive Dataset for Predicting Diabetes with Medical & Demographic Data. https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset
  • Morris, A., & Misra, H. (2002). Confusion matrix based posterior probabilities correction.
  • Allen, G. D., & Goldsby, D. (2014). Confusion theory and assessment. International Journal of Innovative Science, Engineering & Technology, 1(10), 436-443.
  • Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1), 168-192. https://doi.org/10.1016/j.aci.2018.08.003
Year 2024, Volume: 8 Issue: 3, 583 - 592, 28.07.2024
https://doi.org/10.31127/tuje.1434305

Abstract

References

  • WHO. (2023). Diabetes, Diabetes Report. https://www.who.int/health-topics/diabetes#tab=tab_1
  • IDF (2021). Facts & figures. https://idf.org/about-diabetes/diabetes-facts-figures/
  • Woldaregay, A. Z., Årsand, E., Botsis, T., Albers, D., Mamykina, L., & Hartvigsen, G. (2019). Data-driven blood glucose pattern classification and anomalies detection: machine-learning applications in type 1 diabetes. Journal of medical Internet research, 21(5), e11030. https://doi.org/10.2196/11030
  • Sabitha, E., & Durgadevi, M. (2022). Improving the diabetes Diagnosis prediction rate using data preprocessing, data augmentation and recursive feature elimination method. International Journal of Advanced Computer Science and Applications, 13(9), 921-930. https://doi.org/10.14569/IJACSA.2022.01309107
  • Choubey, S., Agrahari, S., Shaw, A., Dhar, S., Sarma, R. R., Singh, S. K., Das, P., & Saha, B. (2023). Diabetes Prediction Using ML. International Journal for Research in Applied Science and Engineering Technology, 11(6), 4209-4212. https://doi.org/10.22214/ijraset.2023.54415
  • Marcovecchio, M. L. (2017). Complications of acute and chronic hyperglycemia. US Endocrinol, 13(1), 17-21. https://doi.org/10.17925/USE.2017.13.01.17
  • El_Jerjawi, N. S., & Abu-Naser, S. S. (2018). Diabetes prediction using artificial neural network. International Journal of Advanced Science and Technology, 121, 54-64. http://dx.doi.org/10.14257/ijast.2018.121.05
  • Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8, 76516-76531. https://doi.org/10.1109/ACCESS.2020.2989857
  • Temurtas, H., Yumusak, N., & Temurtas, F. (2009). A comparative study on diabetes disease diagnosis using neural networks. Expert Systems with Applications, 36(4), 8610-8615. https://doi.org/10.1016/j.eswa.2008.10.032
  • Bashir, M., Naem, E., Taha, F., Konje, J. C., & Abou-Samra, A. B. (2019). Outcomes of type 1 diabetes mellitus in pregnancy; effect of excessive gestational weight gain and hyperglycaemia on fetal growth. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 13(1), 84-88. https://doi.org/10.1016/j.dsx.2018.08.030
  • Hammer, M., Storey, S., Hershey, D. S., Brady, V. J., Davis, E., Mandolfo, N., Bryant, A. L., & Olausson, J. (2019). Hyperglycemia and Cancer: A State-of-the-Science Review. Oncology Nursing Forum, 46(4), 459-472. https://doi.org/10.1188/19.ONF.459-472
  • Storey, S., Von Ah, D., & Hammer, M. (2017). Measurement of hyperglycemia and impact on the health outcomes in people with cancer: challenges and opportunities. Oncology Nursing Forum, 44(4), E141. https://doi.org/10.1188/17.ONF.E141-E151
  • Griffin, S. J., Little, P. S., Hales, C. N., Kinmonth, A. L., & Wareham, N. J. (2000). Diabetes risk score: towards earlier detection of type 2 diabetes in general practice. Diabetes/metabolism Research and Reviews, 16(3), 164-171. https://doi.org/10.1002/1520-7560(200005/06)16:3<164::AID-DMRR103>3.0.CO;2-R
  • Park, P. J., Griffin, S. J., Sargeant, L., & Wareham, N. J. (2002). The performance of a risk score in predicting undiagnosed hyperglycemia. Diabetes Care, 25(6), 984-988. https://doi.org/10.2337/diacare.25.6.984
  • Lindstrom, J., & Tuomilehto, J. (2003). The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care, 26(3), 725-731. https://doi.org/10.2337/diacare.26.3.725
  • Heikes, K. E., Eddy, D. M., Arondekar, B., & Schlessinger, L. (2008). Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care, 31(5), 1040-1045. https://doi.org/10.2337/dc07-1150
  • Stern, M. P., Williams, K., & Haffner, S. M. (2002). Identification of persons at high risk for type 2 diabetes mellitus: do we need the oral glucose tolerance test?. Annals of Internal Medicine, 136(8), 575-581. https://doi.org/10.7326/0003-4819-136-8-200204160-00006
  • Kodama, S., Fujihara, K., Horikawa, C., Kitazawa, M., Iwanaga, M., Kato, K., ... & Sone, H. (2022). Predictive ability of current machine learning algorithms for type 2 diabetes mellitus: A meta‐analysis. Journal of Diabetes Investigation, 13(5), 900-908. https://doi.org/10.1111/jdi.13736
  • Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal, 15, 104-116. https://doi.org/10.1016/j.csbj.2016.12.005
  • Nai-Arun, N., & Moungmai, R. (2015). Comparison of classifiers for the risk of diabetes prediction. Procedia Computer Science, 69, 132-142. https://doi.org/10.1016/j.procs.2015.10.014
  • Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220, 106773. https://doi.org/10.1016/j.cmpb.2022.106773
  • Singh, A., Halgamuge, M. N., & Lakshmiganthan, R. (2017). Impact of different data types on classifier performance of random forest, naive bayes, and k-nearest neighbors algorithms. International Journal of Advanced Computer Science and Applications, 8(12), 1-10.
  • Tejedor, M., Woldaregay, A. Z., & Godtliebsen, F. (2020). Reinforcement learning application in diabetes blood glucose control: A systematic review. https://doi.org/10.1016/j.artmed.2020.101836
  • Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89-109. https://doi.org/10.1016/S0933-3657(01)00077-X
  • Asfaw, T. A. (2019). Prediction of diabetes mellitus using machine learning techniques. International Journal of Computer Engineering and Technology, 10(4), 145-148. https://doi.org/10.34218/ijcet.10.4.2019.004
  • Yu, W., Liu, T., Valdez, R., Gwinn, M., & Khoury, M. J. (2010). Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Medical Informatics and Decision Making, 10, 1-7. https://doi.org/10.1186/1472-6947-10-16
  • MacMahon, H., Naidich, D. P., Goo, J. M., Lee, K. S., Leung, A. N., Mayo, J. R., ... & Bankier, A. A. (2017). Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017. Radiology, 284(1), 228-243. https://doi.org/10.1148/radiol.2017161659
  • Maniruzzaman, M., Rahman, M. J., Al-MehediHasan, M., Suri, H. S., Abedin, M. M., El-Baz, A., & Suri, J. S. (2018). Accurate diabetes risk stratification using machine learning: role of missing value and outliers. Journal of Medical Systems, 42, 92. https://doi.org/10.1007/s10916-018-0940-7
  • Ahuja, R., Sharma, S. C., & Ali, M. (2019). A diabetic disease prediction model based on classification algorithms. Annals of Emerging Technologies in Computing (AETiC), 3(3), 44-52. https://doi.org/10.33166/AETiC.2019.03.005
  • Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine learning based diabetes classification and prediction for healthcare applications. Journal of Healthcare Engineering, 2021(1), 9930985. https://doi.org/10.1155/2021/9930985
  • Roy, K., Ahmad, M., Waqar, K., Priyaah, K., Nebhen, J., Alshamrani, S. S., ... & Ali, I. (2021). An enhanced machine learning framework for type 2 diabetes classification using imbalanced data with missing values. Complexity, 2021(1), 9953314. https://doi.org/10.1155/2021/9953314
  • Muhammad, L. J., Algehyne, E. A., & Usman, S. S. (2020). Predictive supervised machine learning models for diabetes mellitus. SN Computer Science, 1(5), 240. https://doi.org/10.1007/s42979-020-00250-8
  • Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC Endocrine Disorders, 19, 1-9. https://doi.org/10.1186/s12902-019-0436-6
  • Abnoosian, K., Farnoosh, R., & Behzadi, M. H. (2023). Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinformatics, 24(1), 337. https://doi.org/10.1186/s12859-023-05465-z
  • Mustafa, M. (2023). A Comprehensive Dataset for Predicting Diabetes with Medical & Demographic Data. https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset
  • Morris, A., & Misra, H. (2002). Confusion matrix based posterior probabilities correction.
  • Allen, G. D., & Goldsby, D. (2014). Confusion theory and assessment. International Journal of Innovative Science, Engineering & Technology, 1(10), 436-443.
  • Tharwat, A. (2021). Classification assessment methods. Applied Computing and Informatics, 17(1), 168-192. https://doi.org/10.1016/j.aci.2018.08.003
There are 38 citations in total.

Details

Primary Language English
Subjects Data Communications
Journal Section Articles
Authors

Danjuma Maza 0000-0002-7079-2301

Joshua Olufemi Ojo 0009-0002-5977-9613

Grace Olubumi Akinlade 0000-0002-0974-5629

Early Pub Date July 15, 2024
Publication Date July 28, 2024
Submission Date February 9, 2024
Acceptance Date April 17, 2024
Published in Issue Year 2024 Volume: 8 Issue: 3

Cite

APA Maza, D., Ojo, J. O., & Akinlade, G. O. (2024). A predictive machine learning framework for diabetes. Turkish Journal of Engineering, 8(3), 583-592. https://doi.org/10.31127/tuje.1434305
AMA Maza D, Ojo JO, Akinlade GO. A predictive machine learning framework for diabetes. TUJE. July 2024;8(3):583-592. doi:10.31127/tuje.1434305
Chicago Maza, Danjuma, Joshua Olufemi Ojo, and Grace Olubumi Akinlade. “A Predictive Machine Learning Framework for Diabetes”. Turkish Journal of Engineering 8, no. 3 (July 2024): 583-92. https://doi.org/10.31127/tuje.1434305.
EndNote Maza D, Ojo JO, Akinlade GO (July 1, 2024) A predictive machine learning framework for diabetes. Turkish Journal of Engineering 8 3 583–592.
IEEE D. Maza, J. O. Ojo, and G. O. Akinlade, “A predictive machine learning framework for diabetes”, TUJE, vol. 8, no. 3, pp. 583–592, 2024, doi: 10.31127/tuje.1434305.
ISNAD Maza, Danjuma et al. “A Predictive Machine Learning Framework for Diabetes”. Turkish Journal of Engineering 8/3 (July 2024), 583-592. https://doi.org/10.31127/tuje.1434305.
JAMA Maza D, Ojo JO, Akinlade GO. A predictive machine learning framework for diabetes. TUJE. 2024;8:583–592.
MLA Maza, Danjuma et al. “A Predictive Machine Learning Framework for Diabetes”. Turkish Journal of Engineering, vol. 8, no. 3, 2024, pp. 583-92, doi:10.31127/tuje.1434305.
Vancouver Maza D, Ojo JO, Akinlade GO. A predictive machine learning framework for diabetes. TUJE. 2024;8(3):583-92.
Flag Counter