TY - JOUR T1 - Enhancing Multi-Disease Prediction with Machine Learning: A Comparative Analysis and Hyperparameter Optimization Approach AU - Atasoy, Ferhat AU - Bechir, Mariam Kili PY - 2025 DA - March Y2 - 2025 DO - 10.29109/gujsc.1489959 JF - Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji JO - GUJS Part C PB - Gazi Üniversitesi WT - DergiPark SN - 2147-9526 SP - 367 EP - 381 VL - 13 IS - 1 LA - en AB - Although traditional methods based on statistical parameters are still important in healthcare, Machine learning (ML) algorithms offer promising results for analyzing health data. Therefore, the presented work aimed to evaluate the success of several supervised ML models with hyperparameter optimization (HPO) for predicting multiple diseases such as diabetes, heart disease, Parkinson's disease, and breast cancer.We evaluated seven distinct algorithms: Logistic Regression (LR), Gradient Boosting (GB), k-Nearest Neighbors (k-NN), Extreme Gradient Boosting (XGB), Support Vector Machines (SVM), Random Forests (RF), and a basic "nonlinear mapping technique". Each algorithm was trained and compared in isolation for each targeted health condition. The success of these techniques was assessed using standard performance metrics like accuracy, precision, F1-score, and recall. Additionally, hyperparameter optimization was applied to each algorithm and its effect on the result was observed. The results show the potential of ML for multiple disease prediction with individual models achieving high accuracy for specific diseases. SVM achieved 100% accuracy for heart disease, Gradient Boosting achieved 90% for diabetes, a simple Neural Network achieved 99% for breast cancer, and Random Forest achieved 100% for Parkinson's disease. These results emphasize the importance of selecting appropriate models for specific disease prediction tasks.A web-based application has been developed so that users can easily use the models by selecting a disease, providing relevant input, and receiving a prediction based on the chosen model. In conclusion, this study highlights the potential of machine learning and hyperparameter optimization for multi-disease prediction and underlines the importance of model selection. KW - Machine Learning KW - Artificial Neural Network KW - Supervised Learning KW - Multi-Disease Prediction KW - Hyperparameter Optimization KW - User-Friendly Application CR - [1] N. AYDIN ATASOY and F. ÇAKMAK, “Web Tabanlı Sürücü Davranışları Analiz Uygulaması,” Gazi Journal of Engineering Sciences, vol. 7, no. 3, pp. 264–276, Dec. 2021, doi: 10.30855/gmbd.2021.03.09. CR - [2] E. DİKBIYIK, Ö. DEMİR, and B. DOĞAN, “Derin Öğrenme Yöntemleri İle Konuşmadan Duygu Tanıma Üzerine Bir Literatür Araştırması,” Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, vol. 10, no. 4, pp. 765–791, Dec. 2022, doi: 10.29109/gujsc.1111884. CR - [3] Ö. TONKAL and H. POLAT, “Traffic Classification and Comparative Analysis with Machine Learning Algorithms in Software Defined Networks,” Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, vol. 9, no. 1, pp. 71–83, Mar. 2021, doi: 10.29109/gujsc.869418. CR - [4] M. B. ER, “Akciğer Seslerinin Derin Öğrenme İle Sınıflandırılması,” Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, vol. 8, no. 4, pp. 830–844, Dec. 2020, doi: 10.29109/gujsc.758325. CR - [5] R. Alanazi, “Identification and Prediction of Chronic Diseases Using Machine Learning Approach,” J Healthc Eng, vol. 2022, 2022, doi: 10.1155/2022/2826127. CR - [6] I. D. Mienye, Y. Sun, and Z. Wang, “An improved ensemble learning approach for the prediction of heart disease risk,” Inform Med Unlocked, vol. 20, Jan. 2020, doi: 10.1016/j.imu.2020.100402. CR - [7] S. Dhabarde, R. Mahajan, S. Mishra, S. Chaudhari, S. Manelu, and N. S. Shelke, “DISEASE PREDICTION USING MACHINE LEARNING ALGORITHMS”, [Online]. Available: www.irjmets.com CR - [8] S. Vilas and A. M. S. Scholar, “Diseases Prediction Model using Machine Learning Technique”, doi: 10.32628/IJSRST. CR - [9] A. Mujumdar and V. Vaidehi, “Diabetes Prediction using Machine Learning Algorithms,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 292–299. doi: 10.1016/j.procs.2020.01.047. CR - [10] T. H. H. Aldhyani, A. S. Alshebami, and M. Y. Alzahrani, “Soft Clustering for Enhancing the Diagnosis of Chronic Diseases over Machine Learning Algorithms,” J Healthc Eng, vol. 2020, 2020, doi: 10.1155/2020/4984967. CR - [11] S. F. Weng, J. Reps, J. Kai, J. M. Garibaldi, and N. Qureshi, “Can Machine-learning improve cardiovascular risk prediction using routine clinical data?,” PLoS One, vol. 12, no. 4, Apr. 2017, doi: 10.1371/JOURNAL.PONE.0174944. CR - [12] S. Nusinovici et al., “Logistic regression was as good as machine learning for predicting major chronic diseases,” J Clin Epidemiol, vol. 122, pp. 56–69, Jun. 2020, doi: 10.1016/J.JCLINEPI.2020.03.002. CR - [13] J. Al Nahian, A. K. M. Masum, S. Abujar, and M. J. Mia, “Common human diseases prediction using machine learning based on survey data,” Bulletin of Electrical Engineering and Informatics, vol. 11, no. 6, pp. 3498–3508, Dec. 2022, doi: 10.11591/eei.v11i6.3405. CR - [14] N. Aydin Atasoy and A. Faris Abdulla Al Rahhawi, “Examining the classification performance of pre-trained capsule networks on imbalanced bone marrow cell dataset,” International Journal of Imaging Systems and Technology , vol. 34, no. 3, May 2024, doi: 10.1002/ima.23067. CR - [15] J. Bergstra, J. B. Ca, and Y. B. Ca, “Random Search for Hyper-Parameter Optimization Yoshua Bengio,” 2012. [Online]. Available: http://scikit-learn.sourceforge.net. CR - [16] M. Claesen and B. De Moor, “Hyperparameter Search in Machine Learning,” Feb. 2015, [Online]. Available: http://arxiv.org/abs/1502.02127 CR - [17] Y. A. Ali, E. M. Awwad, M. Al-Razgan, and A. Maarouf, “Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity,” Processes, vol. 11, no. 2, Feb. 2023, doi: 10.3390/pr11020349. CR - [18] A. E. W. Johnson et al., “MIMIC-III, a freely accessible critical care database,” Sci Data, vol. 3, May 2016, doi: 10.1038/sdata.2016.35. CR - [19] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms.” CR - [20] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” 2002. CR - [21] M. ÇOLAK, T. TÜMER SİVRİ, N. PERVAN AKMAN, A. BERKOL, and Y. EKİCİ, “Disease prognosis using machine learning algorithms based on new clinical dataset,” Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, vol. 65, no. 1, pp. 52–68, Jun. 2023, doi: 10.33769/aupse.1215962. CR - [22] F. A. Latifah, I. Slamet, and Sugiyanto, “Comparison of heart disease classification with logistic regression algorithm and random forest algorithm,” AIP Conf Proc, vol. 2296, Nov. 2020, doi: 10.1063/5.0030579. CR - [23] R. Valarmathi and T. Sheela, “Heart disease prediction using hyper parameter optimization (HPO) tuning,” Biomed Signal Process Control, vol. 70, p. 103033, Sep. 2021, doi: 10.1016/J.BSPC.2021.103033. CR - [24] M. Feurer and F. Hutter, “Hyperparameter Optimization,” in Automated Machine Learning, 2019, pp. 3–33. doi: 10.1007/978-3-030-05318-5_1. CR - [25] B. Bischl, J. Richter, J. Bossek, D. Horn, J. Thomas, and M. Lang, “mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions,” Mar. 2017, [Online]. Available: http://arxiv.org/abs/1703.03373 CR - [26] G. Luo, “A review of automatic selection methods for machine learning algorithms and hyper-parameter values,” Network Modeling Analysis in Health Informatics and Bioinformatics, vol. 5, no. 1, Dec. 2016, doi: 10.1007/s13721-016-0125-6. CR - [27] P. Probst and B. Bischl, “Tunability: Importance of Hyperparameters of Machine Learning Algorithms,” 2019. [Online]. Available: http://jmlr.org/papers/v20/18-444.html. CR - [28] L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, Nov. 2020, doi: 10.1016/j.neucom.2020.07.061. CR - [29] D. J. Hand, “Measuring classifier performance: A coherent alternative to the area under the ROC curve,” Mach Learn, vol. 77, no. 1, pp. 103–123, Oct. 2009, doi: 10.1007/s10994-009-5119-5. CR - [30] T. Hastie, R. Tibshirani, and J. Friedman, “Springer Series in Statistics The Elements of Statistical Learning Data Mining, Inference, and Prediction.” CR - [31] A. Y. Ng, “Feature selection, L 1 vs. L 2 regularization, and rotational invariance,” in Twenty-first international conference on Machine learning - ICML ’04, New York, New York, USA: ACM Press, 2004, p. 78. doi: 10.1145/1015330.1015435. CR - [32] H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” J R Stat Soc Series B Stat Methodol, vol. 67, no. 2, pp. 301–320, 2005, doi: 10.1111/j.1467-9868.2005.00503.x. CR - [33] F. Pedregosa FABIANPEDREGOSA et al., “Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot,” 2011. [Online]. Available: http://scikit-learn.sourceforge.net. CR - [34] G. C. Cawley and N. L. C. Talbot, “On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation,” 2010. CR - [35] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. CR - [36] L. Breiman, “Random Forests,” in Machine Learning, vol. 45, 2001, pp. 5–32. doi: 10.1023/A:1010933404324. CR - [37] P. Probst, M. N. Wright, and A. L. Boulesteix, “Hyperparameters and tuning strategies for random forest,” in Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 3, Wiley-Blackwell, 2019. doi: 10.1002/widm.1301. CR - [38] T. M. Oshiro, P. S. Perez, and J. A. Baranauskas, “How Many Trees in a Random Forest?,” 2012, pp. 154–168. doi: 10.1007/978-3-642-31537-4_13. CR - [39] G. Biau and E. Scornet, “A random forest guided tour,” Test, vol. 25, no. 2, pp. 197–227, Jun. 2016, doi: 10.1007/s11749-016-0481-7. CR - [40] G. Louppe, “Understanding Random Forests: From Theory to Practice,” Jul. 2014, [Online]. Available: http://arxiv.org/abs/1407.7502 CR - [41] C. Cortes, V. Vapnik, and L. Saitta, “Support-Vector Networks Editor,” Kluwer Academic Publishers, 1995. CR - [42] B. Schölkopf and A. J. Smola, “Kernel Methods,” in Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, 2001, pp. 405–406. CR - [43] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support Vector Machines,” 2001. [Online]. Available: www.csie.ntu.edu.tw/ CR - [44] M. M. Deza and E. Deza, Encyclopedia of distances. Springer Berlin Heidelberg, 2009. doi: 10.1007/978-3-642-00234-2. CR - [45] S. A. Dudani, “The Distance-Weighted k-Nearest-Neighbor Rule,” IEEE Trans Syst Man Cybern, vol. SMC-6, no. 4, pp. 325–327, 1976, doi: 10.1109/TSMC.1976.5408784. CR - [46] R. J. Samworth, “Optimal weighted nearest neighbour classifiers,” Ann Stat, vol. 40, no. 5, pp. 2733–2763, Oct. 2012, doi: 10.1214/12-AOS1049. CR - [47] J. H. Friedman, “Stochastic gradient boosting,” 2002. [Online]. Available: www.elsevier.com/locate/csda [48] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785. CR - [49] T. Chen and T. He, “xgboost: eXtreme Gradient Boosting,” in R Package, 2024. CR - [50] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front Neurorobot, vol. 7, no. DEC, 2013, doi: 10.3389/fnbot.2013.00021. CR - [51] P. N. Astya, Galgotias University. School of Computing Science and Engineering, Institute of Electrical and Electronics Engineers. Uttar Pradesh Section, Institute of Electrical and Electronics Engineers. Uttar Pradesh Section. SP/C Joint Chapter, and Institute of Electrical and Electronics Engineers, Proceeding, International Conference on Computing, Communication and Automation (ICCCA 2016) : 29-30 April, 2016. CR - [52] S. F. Weng, J. Reps, J. Kai, J. M. Garibaldi, and N. Qureshi, “Can Machine-learning improve cardiovascular risk prediction using routine clinical data?,” PLoS One, vol. 12, no. 4, Apr. 2017, doi: 10.1371/JOURNAL.PONE.0174944. CR - [53] S. Nusinovici et al., “Logistic regression was as good as machine learning for predicting major chronic diseases,” J Clin Epidemiol, vol. 122, pp. 56–69, Jun. 2020, doi: 10.1016/J.JCLINEPI.2020.03.002. CR - [54] W. Wang, J. Lee, F. Harrou, and Y. Sun, “Early Detection of Parkinson’s Disease Using Deep Learning and Machine Learning,” IEEE Access, vol. 8, pp. 147635–147646, 2020, doi: 10.1109/ACCESS.2020.3016062. CR - [55] E. Kabir Hashi and M. Shahid Uz Zaman, “Developing a Hyperparameter Tuning Based Machine Learning Approach of Heart Disease Prediction,” Journal of Applied Science & Process Engineering, vol. 7, no. 2, 2020. CR - [56] D. Hamid, S. S. Ullah, J. Iqbal, S. Hussain, C. A. U. Hassan, and F. Umar, “A Machine Learning in Binary and Multiclassification Results on Imbalanced Heart Disease Data Stream,” J Sens, vol. 2022, 2022, doi: 10.1155/2022/8400622. CR - [57] M. Wang, Z. Wei, M. Jia, L. Chen, and H. Ji, “Deep learning model for multi-classification of infectious diseases from unstructured electronic medical records,” BMC Med Inform Decis Mak, vol. 22, no. 1, Dec. 2022, doi: 10.1186/s12911-022-01776-y. CR - [58] C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, “Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median,” J Exp Soc Psychol, vol. 49, no. 4, pp. 764–766, Jul. 2013, doi: 10.1016/j.jesp.2013.03.013. CR - [59] A. Geron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd ed. O’Reilly Media, Inc., 2019. CR - [60] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-Validation,” Encyclopedia of Database Systems, pp. 532–538, 2009, doi: 10.1007/978-0-387-39940-9_565. CR - [61] “View of SMOTE: Synthetic Minority Over-sampling Technique.” Accessed: Feb. 05, 2025. [Online]. Available: https://www.jair.org/index.php/jair/article/view/10302/24590 UR - https://doi.org/10.29109/gujsc.1489959 L1 - https://dergipark.org.tr/tr/download/article-file/3956066 ER -