TY - JOUR T1 - A Balanced Machine Learning Approach to Obesity Risk Classification: Comparative Analysis and Feature Importance AU - Koç, Haydar AU - Koc, Tuba PY - 2025 DA - December Y2 - 2025 DO - 10.52148/ehta.1768556 JF - Eurasian Journal of Health Technology Assessment JO - EHTA PB - Sağlık Bakanlığı Sağlık Hizmetleri Genel Müdürlüğü WT - DergiPark SN - 2587-0122 SP - 90 EP - 107 VL - 9 IS - 2 LA - en AB - Obesity is a growing public health concern, particularly among university students who are exposed to lifestyle changes, disordered eating habits, and reduced physical activity. The aim of this study is to classify obesity risk levels among university students using machine learning classification methods and to identify the most influential factors associated with this risk. The study sample consisted of data collected from 445 students studying at Çankırı Karatekin University. In this context, eight machine learning algorithms—Logistic Regression, Random Forest, Extra Trees, Support Vector Machines, K-Nearest Neighbor, Quadratic Discriminant Analysis, Naive Bayes, and Multilayer Perceptron—were compared to classify obesity risk. Class imbalance in the dataset was addressed using the Adaptive Synthetic Sampling (ADASYN) method applied exclusively to the training set. The models were evaluated using standard performance metrics, and the highest accuracy rate (96.26%) was achieved by the Random Forest model, followed by Logistic Regression with 94.77% accuracy. Variable importance analysis indicated that age, internet use scale score, and fast-food consumption frequency were the most influential factors in classification, while the low correlation between variables (|r| < 0.2) suggested that model performance was driven by the combined contribution of multiple features. Overall, the findings demonstrate that the balanced machine learning approach, particularly ensemble-based methods, can classify obesity risk with high accuracy and provide valuable insights for targeted prevention strategies among university students. KW - Adaptive synthetic sampling KW - machine learning KW - obesity KW - young adults. CR - 1. Akın, P. (2023). A new hybrid approach based on genetic algorithm and support vector machine methods for hyperparameter optimization in synthetic minority over-sampling technique (SMOTE). AIMS Mathematics, 8(6), 9400–9415. CR - 2. Alzahrani, S. H., Saeedi, A. A., Baamer, M. K., Shalabi, A. F., & Alzahrani, A. M. (2020). Eating habits among medical students at king abdulaziz university, Jeddah, Saudi Arabia. International journal of general medicine, 77-88. CR - 3. Bikku, T. (2020). Multi-layered deep learning perceptron approach for health risk prediction. Journal of Big Data, 7(1), 50. CR - 4. Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning (Vol. 4, No. 4, p. 738). New York: springer. CR - 5. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. CR - 6. Brownlee, J. (2020). Imbalanced classification with Python: better metrics, balance skewed classes, cost-sensitive learning. Machine Learning Mastery. CR - 7. Chatterjee, A., Gerdes, M. W., & Martinez, S. G. (2020). Identification of risk factors associated with obesity and overweight—a machine learning overview. Sensors, 20(9), 2734. CR - 8. Choudhuri, A. (2022). A hybrid machine learning model for estimation of obesity levels. In Data management, analytics and innovation conference (pp. 257–266). Springer. https://doi.org/10.1007/978-981-19-2600-6_22 CR - 9. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297. CR - 10. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27. CR - 11. Dirik, M. (2023). Application of machine learning techniques for obesity prediction: a comparative study. Journal of complexity in Health Sciences, 6(2), 16-34. CR - 12. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine learning, 29(2), 103-130. CR - 13. Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., ... & Lautenbach, S. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27-46. CR - 14. Ferdowsy, F., Rahi, K. S. A., Jabiullah, M. I., & Habib, M. T. (2021). A machine learning approach for obesity risk prediction. Current Research in Behavioral Sciences, 2, 100053. CR - 15. Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, No. 2018, p. 4). Cham: Springer. CR - 16. Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems?. The journal of machine learning research, 15(1), 3133-3181. CR - 17. Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. " O'Reilly Media, Inc.". CR - 18. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42. CR - 19. Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. (No Title). CR - 20. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). Ieee. CR - 21. Helforoush, Z., & Sayyad, H. (2024). Prediction and classification of obesity risk based on a hybrid metaheuristic machine learning approach. Frontiers in big Data, 7, 1469981. CR - 22. Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons. CR - 23. Hruby, A., & Hu, F. B. (2015). The epidemiology of obesity: a big picture. Pharmacoeconomics, 33(7), 673-689. CR - 24. Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS international transactions on computer science and engineering, 30(1), 25-36. CR - 25. Musa, F., & Basaky, F. (2022). Obesity prediction using machine learning techniques. Journal of Applied Artificial Intelligence, 3(1), 24–33. CR - 26. Murtagh, F. (1991). Multilayer perceptrons for classification and regression. Neurocomputing, 2(5-6), 183-197. CR - 27. Naidu, G., Zuva, T., Sibanda, E.M. (2023). A Review of Evaluation Metrics in Machine Learning Algorithms. In: Silhavy, R., Silhavy, P. (eds) Artificial Intelligence Application in Networks and Systems. CSOC 2023. Lecture Notes in Networks and Systems, vol 724. Springer, Cham. https://doi.org/10.1007/978-3-031-35314-7_2 CR - 28. Nelson, M. C., Story, M., Larson, N. I., Neumark-Sztainer, D., & Lytle, L. A. (2008). Emerging adulthood and college-aged youth: an overlooked age for weight-related behavior change. Obesity. CR - 29. Olagunju, M. T., Aleru, E. O., Abodunrin, O. R., Adedini, C. B., Ola, O. M., Abel, C., ... & Akinsolu, F. T. (2024). Association between meal skipping and the double burden of malnutrition among university students. North African Journal of Food and Nutrition Research, 8(17), 167-177. CR - 30. Şengul, S., Lopcu, K., & Cam, S. (2020). Determinants of the obesity of adults in Turkey: An empirical study. Review of applied socio-economic research, 20(2), 60-71. CR - 31. Pendergast, F. J., Livingstone, K. M., Worsley, A., & McNaughton, S. A. (2016). Correlates of meal skipping in young adults: a systematic review. International Journal of Behavioral Nutrition and Physical Activity, 13(1), 125. CR - 32. Rish, I. (2001, August). An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence (Vol. 3, No. 22, pp. 41-46). CR - 33. Şahin, C., & Korkmaz, Ö. (2011). İnternet bağımlılığı ölçeğinin Türkçeye uyarlanması. Selçuk Üniversitesi Ahmet Keleşoğlu Eğitim Fakültesi Dergisi, 32(1), 101-115. CR - 34. World Health Organization. (2024). Obesity and overweight. https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight CR - 35. Yağmur, N. (2024). A hybrid approach to obesity level determination with decision tree and pelican optimization algorithm. Journal of Scientific Reports-A, 57, 97–109. https://doi.org/10.59313/jsr-a.1447814 UR - https://doi.org/10.52148/ehta.1768556 L1 - https://dergipark.org.tr/tr/download/article-file/5169582 ER -