Research Article

Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease

Volume: 11 Number: 3 September 30, 2024
EN

Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease

Abstract

Diabetes is a prevalent non-communicable disease affecting many people globally. The common risk factors are obesity, age, lack of exercise, lifestyle, genetic factors, high blood pressure, and poor diet. Early identification of this condition can help prevent subsequent complications, including heart attacks, lower limb amputations, nerve damage, and blindness. Data mining and machine learning have become popular and successful methods of identifying numerous diseases, including Diabetes, using clinical data over the years. This study focuses on the principles and processes of Naïve Bayes, Support Vector Machines, Logistic Regression, Decision Tree, and Random Forest algorithms for diabetes prediction, using the Scikit-learn inbuilt libraries for the experiments. Furthermore, we ensemble all five machine learning models to produce a single stacked ensemble model. Data preprocessing techniques such as scaling, missing data removal, dimensionality reduction, and balancing of target class were performed on the Jos Urban Diabetes dataset used for this study. The comparison of the algorithms' performances across various evaluation metrics, demonstrates that the Support Vector Machines algorithm outperform all others in terms of Accuracy, Precision, Sensitivity, and Matthew’s Correlation Coefficient with scores of 96.11%, 91.61%, 85.67%, and 82.59% respectively with 10-fold cross-validation. Furthermore, the Stacked Ensemble Method model had the best Area Under the Receiver Operating Characteristic Curve scores of 98.47% with 10-fold cross-validation.

Keywords

References

  1. Armstrong, A. (2022, March 1). Python in Healthcare: AI Applications in Hospitals. https://www.datacamp.com/blog/python-in-healthcare-ai-applications-in-hospitals?utm_medium=email&utm_source=customerio&utm_id=7430059&utm_campaign=dc_insights&utm_term=v2blog
  2. Bhatia, P. (2019). Data mining and data warehousing: Principles and practical techniques. Cambridge University Press.
  3. Birjais, R., Mourya, A. K., Chauhan, R., & Kaur, H. (2019). Prediction and diagnosis of future diabetes risk: A machine learning approach. SN Applied Sciences, 1(9), 1112. https://doi.org/10.1007/s42452-019-1117-9
  4. Choudhary, D. (2021, April 18). Bootstrapping and OOB samples in Random Forests. Analytics Vidhya. https://medium.com/analytics-vidhya/bootstrapping-and-oob-samples-in-random-forests-6e083b6bc341
  5. Choudhury, A., & Gupta, D. (2019). A Survey on Medical Diagnosis of Diabetes Using Machine Learning Techniques. In: J. Kalita, V. E. Balas, S. Borah, & R. Pradhan (Eds.), Recent Developments in Machine Learning and Data Analytics (Vol. 740, pp. 67-78). Springer Singapore. https://doi.org/10.1007/978-981-13-1280-9_6
  6. Gandhi, R. (2018, May 17). Naive Bayes Classifier. Medium. https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c
  7. Harrison, G. (2022, February 28). A Deep Dive into Stacking Ensemble Machine Learning—Part I. Medium. https://towardsdatascience.com/a-deep-dive-into-stacking-ensemble-machine-learning-part-i-10476b2ade3
  8. Ibrahim, I., & Abdulazeez, A. (2021). The Role of Machine Learning Algorithms for Diagnosing Diseases. Journal of Applied Science and Technology Trends, 2(01), 10-19. https://doi.org/10.38094/jastt20179

Details

Primary Language

English

Subjects

Machine Learning (Other)

Journal Section

Research Article

Publication Date

September 30, 2024

Submission Date

August 12, 2024

Acceptance Date

September 16, 2024

Published in Issue

Year 2024 Volume: 11 Number: 3

APA
Zoakah, N., Shey Nsang, A., Ajibesin, A., & Zoakah, A. (2024). Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease. Gazi University Journal of Science Part A: Engineering and Innovation, 11(3), 622-646. https://doi.org/10.54287/gujsa.1531997
AMA
1.Zoakah N, Shey Nsang A, Ajibesin A, Zoakah A. Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease. GU J Sci, Part A. 2024;11(3):622-646. doi:10.54287/gujsa.1531997
Chicago
Zoakah, Nathan, Augustine Shey Nsang, Abel Ajibesin, and Ayuba Zoakah. 2024. “Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease”. Gazi University Journal of Science Part A: Engineering and Innovation 11 (3): 622-46. https://doi.org/10.54287/gujsa.1531997.
EndNote
Zoakah N, Shey Nsang A, Ajibesin A, Zoakah A (September 1, 2024) Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease. Gazi University Journal of Science Part A: Engineering and Innovation 11 3 622–646.
IEEE
[1]N. Zoakah, A. Shey Nsang, A. Ajibesin, and A. Zoakah, “Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease”, GU J Sci, Part A, vol. 11, no. 3, pp. 622–646, Sept. 2024, doi: 10.54287/gujsa.1531997.
ISNAD
Zoakah, Nathan - Shey Nsang, Augustine - Ajibesin, Abel - Zoakah, Ayuba. “Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease”. Gazi University Journal of Science Part A: Engineering and Innovation 11/3 (September 1, 2024): 622-646. https://doi.org/10.54287/gujsa.1531997.
JAMA
1.Zoakah N, Shey Nsang A, Ajibesin A, Zoakah A. Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease. GU J Sci, Part A. 2024;11:622–646.
MLA
Zoakah, Nathan, et al. “Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease”. Gazi University Journal of Science Part A: Engineering and Innovation, vol. 11, no. 3, Sept. 2024, pp. 622-46, doi:10.54287/gujsa.1531997.
Vancouver
1.Nathan Zoakah, Augustine Shey Nsang, Abel Ajibesin, Ayuba Zoakah. Comparative Performance Analysis of Selected Machine Learning Algorithms and the Stacking Ensemble Method for Prediction of the Type II Diabetes Disease. GU J Sci, Part A. 2024 Sep. 1;11(3):622-46. doi:10.54287/gujsa.1531997