Research Article
BibTex RIS Cite

Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS

Year 2022, Volume: 10 Issue: 2, 110 - 117, 30.04.2022
https://doi.org/10.17694/bajece.973129

Abstract

A new hybrid machine learning method for the prediction of type 2 diabetes is introduced and explained in detail. Also, outcomes are compared with similar researches. Early prediction of diabetes is crucial to take necessary measures (i.e. changing eating habits, patient weight control etc.), to defer the emergence of diabetes and to reduce the death rate to some extent and ease medical care professionals’ decision-making in preventing and managing diabetes mellitus. The purpose of this study is the creation of a new hybrid feature selection approach combination of Correlation Matrix with Heatmap and Sequential forward selection (SFS) to reveal the most effective features in the detection of diabetes. A diabetes data set with 520 instances and seven features were studied with the application of the proposed hybrid feature selection approach. The evaluation of the selected optimal features was measured by applying Support Vector Machines(SVM), Random Forest(RF), and Artificial Neural Networks(ANN) classifiers. Five evaluation metrics, namely, Accuracy, F-measure, Precision, Recall, and AUC showed the best performance with ANN (99.1%), F-measure (99.1%), Precision (99.3%), Recall (99.1%), and AUC (99.2%). Our proposed hybrid feature selection model provided a more promising performance with ANN compared to other machine learning algorithms.

References

  • Stephanie Watson, “Everything You Need to Know About Diabetes,” 2020. [Online]. Available: https://www.healthline.com/health/diabetes
  • K. Shailaja, B. Seetharamulu, and M. A. Jabbar, “Machine learning in healthcare: A review,” in 2018 Second International Conference on Electronics, Communication, and Aerospace Technology (ICECA), 2018, pp. 910–914.
  • N. Peiffer-Smadja, T. Rawson, R. Ahmad, A. Buchard, G. Pantelis, F.- X. Lescure, G. Birgand, and A. Holmes, “Machine learning for clinical decision support in infectious diseases: A narrative review of current applications,” Clinical Microbiology and Infection, vol. 26, 09 2019.
  • E. Sevinc, “A novel evolutionary algorithm for data classification problem with extreme learning machines,” IEEE Access, vol. 7, pp. 122 419– 122 427, 2019.
  • K. D. Silva, W. K. Lee, A. Forbes, R. T. Demmer, C. Barton, and J. Enticott, “Use and performance of machine learning models for type 2 diabetes prediction in community settings: A systematic review and meta-analysis,” International Journal of Medical Informatics, vol. 143, no. August, p. 104268, 2020. [Online]. Available: https://doi.org/10.1016/j.ijmedinf.2020.104268
  • J. Chaki, S. Thillai Ganesh, S. K. Cidham, and S. Ananda Theertan, “Machine learning and artificial intelligence-based Diabetes Mellitus detection and self-management: A systematic review,” Journal of King Saud University - Computer and Information Sciences, 2020. [Online]. Available: https://doi.org/10.1016/j.jksuci.2020.06.013
  • I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Computational and Structural Biotechnology Journal, vol. 15, pp. 104–116, 2017. [Online]. Available: https: //doi.org/10.1016/j.csbj.2016.12.005
  • D. Jashwanth Reddy, B. Mounika, S. Sindhu, T. Pranayteja Reddy, N. Sagar Reddy, G. Jyothsna Sri, K. Swaraja, K. Meenakshi, and P. Kora, “Predictive machine learning model for early detection and analysis of diabetes,” Materials Today: Proceedings, 2020. [Online]. Available: https://doi.org/10.1016/j.matpr.2020.09.522
  • H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and X. Gao, “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocrine Disorders, vol. 19, no. 1, pp. 1–9, 2019.
  • N. Nai-Arun and R. Moungmai, “Comparison of Classifiers for the Risk of Diabetes Prediction,” Procedia Computer Science, vol. 69, pp. 132–142, 2015. [Online]. Available: http://dx.doi.org/10.1016/j.procs. 2015.10.014
  • Kaggle, “Pima Indians Diabetes Dataset,” 2021. [Online]. Available: https://www.kaggle.com/uciml/pima- Indians- diabetes- database
  • S. Pratama, A. Muda, Y.-H. Choo, and N. Muda, “Computationally in- expensive sequential forward floating selection for acquiring significant features for authorship invariances in writer identification,” International Journal of New Computer Architectures and their Applications (IJNCAA), vol. 1, pp. 581–598, 01 2011.
  • Y. A. Christobel and P. Sivaprakasam, “A New Classwise k Nearest Neighbor ( CKNN ) Method for the Classification of Diabetes Dataset,” International Journal of Engineering and Advanced Technology, vol. 2, no. 3, pp. 396–400, 2013.
  • Wikipedia, “Support vector machine,” 2021. [Online]. Available: https://en.wikipedia.org/wiki/Support-vector{ }machine
  • A. Guha, “Building Explainable and Interpretable model for Diabetes Risk Prediction,” International Journal of Engineering Research and Technology, vol. 9, no. 09, pp. 1037–1042, 2020.
  • A. Kareem, L. Shi, L. Wei, and Y. Tao, “A Comparative Analysis and Risk Prediction of Diabetes at Early Stage using Machine Learning Approach A Comparative Analysis and Risk Prediction of Diabetes at Early Stage using Machine Learning Approach,” International Journal of Future Generation Communication and Networking, vol. 13, no. 3, pp. 4151–4163, 2020.
  • K. Alpan and G. S. Ilgi, “Classification of Diabetes Dataset with Data Mining Techniques by Using WEKA Approach,” in 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). IEEE, Oct 2020, pp. 1–7.
  • J. Xue, F. Min, and F. Ma, “Research on diabetes prediction method based on machine learning,” Journal of Physics: Conference Series, vol. 1684, no. 1, 2020.
  • L.Tapak, H.Mahjub, O.Hamidi, and.Poorolajal,“Real-data comparison of data mining methods in prediction of diabetes in Iran,” Healthcare Informatics Research, vol. 19, no. 3, p. 177, 2013.
  • D. Reddy, B. Mounika, S. Sindhu, T. Reddy, N. Reddy, G. Sri, K. Swaraja, M. Kollati, and P. Kora, “Predictive machine learning model for early detection and analysis of diabetes,” Materials Today: Proceedings, 10 2020.
  • A. Mujumdar and V. Vaidehi, “Diabetes Prediction using Machine Learning Algorithms,” Procedia Computer Science, vol. 165, pp. 292– 299, 2019. [Online]. Available: https://doi.org/10.1016/j.procs.2020.01. 047
  • M. Maniruzzaman, M. J. Rahman, B. Ahammed, and M. M. Abedin, “Classification and prediction of diabetes disease using machine learning paradigm,” Health Information Science and Systems, vol. 8, no. 1, Jan. 2020.
  • D. Deng and N. Kasabov, “On-line pattern analysis by evolving self-organizing maps,” Neurocomputing, vol. 51, pp. 87–103, Apr 2003.
  • M. Farahmandian, Y. Lotfi, and I. Maleki, “Data Mining Algorithms Application in Diabetes Diseases Diagnosis: A Case Study,” MAGNT Research Report, vol. 3, no. 1, pp. 989–997, 2015.
  • M. Khashei, S. Eftekhari, and J. Parvizian, “Diagnosing diabetes type ii using a soft intelligent binary classification model,” Review of Bioinformatics and Biometrics, vol. 1, no. 1, pp. 9–23, 2012.
  • N.Nai-Arun and R.Moungmai,“Comparisonofclassifiersfortheriskof diabetes prediction,” Procedia Computer Science, vol. 69, pp. 132–142, 2015.
  • H. T. Abbas, L. Alic, M. Erraguntla, J. X. Ji, M. Abdul-Ghani, Q. H. Abbasi, and M. K. Qaraqe, “Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test,” PLOS ONE, vol. 14, no. 12, p. e0219636, Dec. 2019.
Year 2022, Volume: 10 Issue: 2, 110 - 117, 30.04.2022
https://doi.org/10.17694/bajece.973129

Abstract

References

  • Stephanie Watson, “Everything You Need to Know About Diabetes,” 2020. [Online]. Available: https://www.healthline.com/health/diabetes
  • K. Shailaja, B. Seetharamulu, and M. A. Jabbar, “Machine learning in healthcare: A review,” in 2018 Second International Conference on Electronics, Communication, and Aerospace Technology (ICECA), 2018, pp. 910–914.
  • N. Peiffer-Smadja, T. Rawson, R. Ahmad, A. Buchard, G. Pantelis, F.- X. Lescure, G. Birgand, and A. Holmes, “Machine learning for clinical decision support in infectious diseases: A narrative review of current applications,” Clinical Microbiology and Infection, vol. 26, 09 2019.
  • E. Sevinc, “A novel evolutionary algorithm for data classification problem with extreme learning machines,” IEEE Access, vol. 7, pp. 122 419– 122 427, 2019.
  • K. D. Silva, W. K. Lee, A. Forbes, R. T. Demmer, C. Barton, and J. Enticott, “Use and performance of machine learning models for type 2 diabetes prediction in community settings: A systematic review and meta-analysis,” International Journal of Medical Informatics, vol. 143, no. August, p. 104268, 2020. [Online]. Available: https://doi.org/10.1016/j.ijmedinf.2020.104268
  • J. Chaki, S. Thillai Ganesh, S. K. Cidham, and S. Ananda Theertan, “Machine learning and artificial intelligence-based Diabetes Mellitus detection and self-management: A systematic review,” Journal of King Saud University - Computer and Information Sciences, 2020. [Online]. Available: https://doi.org/10.1016/j.jksuci.2020.06.013
  • I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Computational and Structural Biotechnology Journal, vol. 15, pp. 104–116, 2017. [Online]. Available: https: //doi.org/10.1016/j.csbj.2016.12.005
  • D. Jashwanth Reddy, B. Mounika, S. Sindhu, T. Pranayteja Reddy, N. Sagar Reddy, G. Jyothsna Sri, K. Swaraja, K. Meenakshi, and P. Kora, “Predictive machine learning model for early detection and analysis of diabetes,” Materials Today: Proceedings, 2020. [Online]. Available: https://doi.org/10.1016/j.matpr.2020.09.522
  • H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and X. Gao, “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocrine Disorders, vol. 19, no. 1, pp. 1–9, 2019.
  • N. Nai-Arun and R. Moungmai, “Comparison of Classifiers for the Risk of Diabetes Prediction,” Procedia Computer Science, vol. 69, pp. 132–142, 2015. [Online]. Available: http://dx.doi.org/10.1016/j.procs. 2015.10.014
  • Kaggle, “Pima Indians Diabetes Dataset,” 2021. [Online]. Available: https://www.kaggle.com/uciml/pima- Indians- diabetes- database
  • S. Pratama, A. Muda, Y.-H. Choo, and N. Muda, “Computationally in- expensive sequential forward floating selection for acquiring significant features for authorship invariances in writer identification,” International Journal of New Computer Architectures and their Applications (IJNCAA), vol. 1, pp. 581–598, 01 2011.
  • Y. A. Christobel and P. Sivaprakasam, “A New Classwise k Nearest Neighbor ( CKNN ) Method for the Classification of Diabetes Dataset,” International Journal of Engineering and Advanced Technology, vol. 2, no. 3, pp. 396–400, 2013.
  • Wikipedia, “Support vector machine,” 2021. [Online]. Available: https://en.wikipedia.org/wiki/Support-vector{ }machine
  • A. Guha, “Building Explainable and Interpretable model for Diabetes Risk Prediction,” International Journal of Engineering Research and Technology, vol. 9, no. 09, pp. 1037–1042, 2020.
  • A. Kareem, L. Shi, L. Wei, and Y. Tao, “A Comparative Analysis and Risk Prediction of Diabetes at Early Stage using Machine Learning Approach A Comparative Analysis and Risk Prediction of Diabetes at Early Stage using Machine Learning Approach,” International Journal of Future Generation Communication and Networking, vol. 13, no. 3, pp. 4151–4163, 2020.
  • K. Alpan and G. S. Ilgi, “Classification of Diabetes Dataset with Data Mining Techniques by Using WEKA Approach,” in 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). IEEE, Oct 2020, pp. 1–7.
  • J. Xue, F. Min, and F. Ma, “Research on diabetes prediction method based on machine learning,” Journal of Physics: Conference Series, vol. 1684, no. 1, 2020.
  • L.Tapak, H.Mahjub, O.Hamidi, and.Poorolajal,“Real-data comparison of data mining methods in prediction of diabetes in Iran,” Healthcare Informatics Research, vol. 19, no. 3, p. 177, 2013.
  • D. Reddy, B. Mounika, S. Sindhu, T. Reddy, N. Reddy, G. Sri, K. Swaraja, M. Kollati, and P. Kora, “Predictive machine learning model for early detection and analysis of diabetes,” Materials Today: Proceedings, 10 2020.
  • A. Mujumdar and V. Vaidehi, “Diabetes Prediction using Machine Learning Algorithms,” Procedia Computer Science, vol. 165, pp. 292– 299, 2019. [Online]. Available: https://doi.org/10.1016/j.procs.2020.01. 047
  • M. Maniruzzaman, M. J. Rahman, B. Ahammed, and M. M. Abedin, “Classification and prediction of diabetes disease using machine learning paradigm,” Health Information Science and Systems, vol. 8, no. 1, Jan. 2020.
  • D. Deng and N. Kasabov, “On-line pattern analysis by evolving self-organizing maps,” Neurocomputing, vol. 51, pp. 87–103, Apr 2003.
  • M. Farahmandian, Y. Lotfi, and I. Maleki, “Data Mining Algorithms Application in Diabetes Diseases Diagnosis: A Case Study,” MAGNT Research Report, vol. 3, no. 1, pp. 989–997, 2015.
  • M. Khashei, S. Eftekhari, and J. Parvizian, “Diagnosing diabetes type ii using a soft intelligent binary classification model,” Review of Bioinformatics and Biometrics, vol. 1, no. 1, pp. 9–23, 2012.
  • N.Nai-Arun and R.Moungmai,“Comparisonofclassifiersfortheriskof diabetes prediction,” Procedia Computer Science, vol. 69, pp. 132–142, 2015.
  • H. T. Abbas, L. Alic, M. Erraguntla, J. X. Ji, M. Abdul-Ghani, Q. H. Abbasi, and M. K. Qaraqe, “Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test,” PLOS ONE, vol. 14, no. 12, p. e0219636, Dec. 2019.
There are 27 citations in total.

Details

Primary Language English
Subjects Artificial Intelligence
Journal Section Araştırma Articlessi
Authors

Selim Buyrukoğlu 0000-0001-7844-3168

Ayhan Akbaş 0000-0002-6425-104X

Publication Date April 30, 2022
Published in Issue Year 2022 Volume: 10 Issue: 2

Cite

APA Buyrukoğlu, S., & Akbaş, A. (2022). Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. Balkan Journal of Electrical and Computer Engineering, 10(2), 110-117. https://doi.org/10.17694/bajece.973129

Cited By

















All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited.Creative Commons Lisansı